From: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
To: "Vivi, Rodrigo" <rodrigo.vivi@intel.com>,
"De Marchi, Lucas" <lucas.demarchi@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Somaiya, Himanshu" <himanshu.somaiya@intel.com>
Subject: Re: [PATCH 3/3] drm/xe: Force wedged state and block GT reset upon any GPU hang
Date: Fri, 15 Mar 2024 06:28:50 +0000 [thread overview]
Message-ID: <11079d969d1127cf390eca53a843a583402f1562.camel@intel.com> (raw)
In-Reply-To: <ZfOfKnlbdIl9lXKa@intel.com>
alan:snip
On Thu, 2024-03-14 at 21:06 -0400, Rodrigo Vivi wrote:
> On Wed, Mar 13, 2024 at 11:00:06PM -0500, Lucas De Marchi wrote:
> > On Wed, Mar 13, 2024 at 06:06:14PM -0400, Rodrigo Vivi wrote:
> > > On Wed, Mar 13, 2024 at 04:54:38PM -0500, Lucas De Marchi wrote:
> > > > On Wed, Mar 13, 2024 at 05:44:00PM -0400, Rodrigo Vivi wrote:
> > > > > On Wed, Mar 13, 2024 at 03:49:56PM -0500, Lucas De Marchi
> > > > > wrote:
> > > > > > On Wed, Mar 13, 2024 at 03:54:59PM -0400, Rodrigo Vivi
> > > > > > wrote:
> > > > > >
> >
> > I think we can use the modparam on probe and already put it in ads.
> > That dictates the default behavior for the _module_ regardless of
> > the device.
>
> agreed. I already sent the 3 patches that accomplished that.
alan: personal opinion - we really ought to have runtime controls in
the case we are looking at both integrated + discrete combination that
needs to be debugged (i.e. debugfs).
>
> > Then we allow either setting the param to change the default
> > behavior
> > like above or we create a debugfs so we can set it per-device after
> > the
> > probe.
>
> I have the 4th patch in here:
> https://github.com/rodrigovivi/linux/commits/xe-busted
> that is targeting this goal. However I'm still dealing with trying to
> change
> the guc sched policy on the fly.
>
> I'm not convinced that i915 code around that 0x506 command is the
> right code,
> so I'm still investigating the spec and doing some experiments.
>
> But I'd like to move forward with this default behavior with module
> parameter so we unblock our sv teams.
>
> thoughts?
alan: IIRC guc has preemption timing per context and can be changed at
runtime (but may only get updated next time the context is scheduled
into the engine?).
Btw, i havent had time to thoroughly go thru all the patches on above
github but based on this series, i dont see us also preventing the
runtime guc/gt-reset (which is also something that the use-case being
targetted needs to avoid). There are a few functions that seem to be
involved in this runtime guc/gt reset (when guc fails to reset engine)
but we must be careful to also not block gucgt-resets for the post-hw-
config readup after the early stage guc load. Bascially we need to find
the paths for writing to GDRST and block (except for that early boot
and also suspend-resume and shutdown).
next prev parent reply other threads:[~2024-03-15 6:28 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-13 19:54 [PATCH 1/3] drm/xe: Introduce a simple wedged state Rodrigo Vivi
2024-03-13 19:54 ` [PATCH 2/3] drm/xe: declare wedged and abort probe upon GuC load failure Rodrigo Vivi
2024-03-13 19:54 ` [PATCH 3/3] drm/xe: Force wedged state and block GT reset upon any GPU hang Rodrigo Vivi
2024-03-13 20:49 ` Lucas De Marchi
2024-03-13 20:56 ` Somaiya, Himanshu
2024-03-13 21:44 ` Rodrigo Vivi
2024-03-13 21:54 ` Lucas De Marchi
2024-03-13 22:06 ` Rodrigo Vivi
2024-03-14 4:00 ` Lucas De Marchi
2024-03-15 1:06 ` Rodrigo Vivi
2024-03-15 6:28 ` Teres Alexis, Alan Previn [this message]
2024-03-13 20:00 ` ✓ CI.Patch_applied: success for series starting with [1/3] drm/xe: Introduce a simple wedged state Patchwork
2024-03-13 20:01 ` ✗ CI.checkpatch: warning " Patchwork
2024-03-13 20:01 ` ✓ CI.KUnit: success " Patchwork
2024-03-13 20:12 ` ✓ CI.Build: " Patchwork
2024-03-13 20:16 ` ✓ CI.Hooks: " Patchwork
2024-03-13 20:18 ` ✓ CI.checksparse: " Patchwork
2024-03-13 20:27 ` [PATCH 1/3] " Lucas De Marchi
2024-03-13 21:40 ` Rodrigo Vivi
2024-03-13 20:39 ` ✓ CI.BAT: success for series starting with [1/3] " Patchwork
2024-03-14 1:40 ` [PATCH 1/3] " Aravind Iddamsetty
2024-03-14 3:33 ` Aravind Iddamsetty
2024-03-14 18:53 ` Rodrigo Vivi
2024-03-15 7:01 ` Aravind Iddamsetty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11079d969d1127cf390eca53a843a583402f1562.camel@intel.com \
--to=alan.previn.teres.alexis@intel.com \
--cc=himanshu.somaiya@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox