Re: [PATCH] drm/xe/pf: Move VFs reprovisioning to worker

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Summers, Stuart" <stuart.summers@intel.com>
To: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Wajdeczko, Michal" <Michal.Wajdeczko@intel.com>
Cc: "Brost, Matthew" <matthew.brost@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH] drm/xe/pf: Move VFs reprovisioning to worker
Date: Mon, 27 Jan 2025 18:28:24 +0000	[thread overview]
Message-ID: <5378e4e0677584aa79d4505895226b8913a6e1ff.camel@intel.com> (raw)
In-Reply-To: <27108b00-20ad-4064-a79d-622e520ec4a4@intel.com>

On Mon, 2025-01-27 at 19:05 +0100, Michal Wajdeczko wrote:
> 
> 
> On 27.01.2025 18:07, Summers, Stuart wrote:
> > On Sat, 2025-01-25 at 22:55 +0100, Michal Wajdeczko wrote:
> > > Since the GuC is reset during GT reset, we need to re-send the
> > > entire SR-IOV provisioning configuration to the GuC. But since
> > > this whole configuration is protected by the PF master mutex and
> > > we can't avoid making allocations under this mutex (like during
> > > LMEM provisioning), we can't do this reprovisioning from gt-reset
> > > path if we want to be reclaim-safe. Move VFs reprovisioning to a
> > > async worker that we will start from the gt-reset path.
> > 
> > Admittedly I don't fully understand the PF restart flow here from
> > userspace. Is there some race condition we need to check for
> > whether
> > GuC completes base configuration before the PF config comes
> > through? Is
> > it possible we can get into either some deadlock between the native
> > init and the PF init or start running content on some engines in
> > native
> > mode before PF completes?
> 
> Even if due to a race we start running PF content on engines before
> we
> finish GuC reconfiguration from native to SRIOV mode, then that
> content
> may just run a little longer than before a reset, due to initial
> "infinity" execution quantum or preemption timeout settings, which in
> SRIOV mode were likely reconfigured to a smaller values.
> 
> Also any race with new provisioning requests from the user space
> should
> be harmless since during a PF restart we will resend whole SRIOV
> configuration, including any latest changes done between GT reset and
> PF
> restart.

Ok thinking out loud here...

So let's say we have an application that has a submission in flight. It
is stuck in drm scheduler for some reason. Then we get a GT reset.
Native mode is configured first per the update here. Drm-scheduler
restarts and submits the workload to GuC. GuC then submits to HW. While
the workload is running in HW, PF mode is configured in GuC. The
workload that was running in HW then completes Before the CSB of the
completion hits GuC though, PF config completes and KMD replays the
workload that just completed in HW again. GuC will check the ring tail
in memory and see that the head/tail match and therefore won't submit
the new workload.

If userspace was waiting on a memory update for job completion (or
semaphore), it should go through either way and should be no harm
having a second workload there even if it was submitted.

But yeah makes sense to me, no issue here.

Reviewed-by: Stuart Summers <stuart.summers@intel.com>

> 
> - Michal

next prev parent reply	other threads:[~2025-01-27 18:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-25 21:55 [PATCH] drm/xe/pf: Move VFs reprovisioning to worker Michal Wajdeczko
2025-01-25 22:45 ` ✓ CI.Patch_applied: success for " Patchwork
2025-01-25 22:45 ` ✓ CI.checkpatch: " Patchwork
2025-01-25 22:46 ` ✓ CI.KUnit: " Patchwork
2025-01-25 23:02 ` ✓ CI.Build: " Patchwork
2025-01-25 23:05 ` ✓ CI.Hooks: " Patchwork
2025-01-25 23:06 ` ✓ CI.checksparse: " Patchwork
2025-01-25 23:33 ` ✓ Xe.CI.BAT: " Patchwork
2025-01-26  0:41 ` ✗ Xe.CI.Full: failure " Patchwork
2025-01-27 17:07   ` Michal Wajdeczko
2025-01-27 14:23 ` [PATCH] " Michał Winiarski
2025-01-27 17:07 ` Summers, Stuart
2025-01-27 18:05   ` Michal Wajdeczko
2025-01-27 18:28     ` Summers, Stuart [this message]
2025-01-27 18:29 ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5378e4e0677584aa79d4505895226b8913a6e1ff.camel@intel.com \
    --to=stuart.summers@intel.com \
    --cc=Michal.Wajdeczko@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox