stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
@ 2024-10-23 20:07 Jonathan Cavitt
  2024-10-28 16:36 ` Dixit, Ashutosh
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jonathan Cavitt @ 2024-10-23 20:07 UTC (permalink / raw)
  To: intel-xe
  Cc: jonathan.cavitt, saurabhg.gupta, alex.zuo, umesh.nerlige.ramappa,
	john.c.harrison, stable

Several OA registers and allowlist registers were missing from the
save/restore list for GuC and could be lost during an engine reset.  Add
them to the list.

v2:
- Fix commit message (Umesh)
- Add missing closes (Ashutosh)

v3:
- Add missing fixes (Ashutosh)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Suggested-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
CC: stable@vger.kernel.org # v6.11+
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
index 4e746ae98888..a196c4fb90fc 100644
--- a/drivers/gpu/drm/xe/xe_guc_ads.c
+++ b/drivers/gpu/drm/xe/xe_guc_ads.c
@@ -15,6 +15,7 @@
 #include "regs/xe_engine_regs.h"
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_guc_regs.h"
+#include "regs/xe_oa_regs.h"
 #include "xe_bo.h"
 #include "xe_gt.h"
 #include "xe_gt_ccs_mode.h"
@@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
 		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
 	}
 
+	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
+		guc_mmio_regset_write_one(ads, regset_map,
+					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
+					  count++);
+
 	/* Wa_1607983814 */
 	if (needs_wa_1607983814(xe) && hwe->class == XE_ENGINE_CLASS_RENDER) {
 		for (i = 0; i < LNCFCMOCS_REG_COUNT; i++) {
@@ -748,6 +754,14 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
 		}
 	}
 
+	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL0, count++);
+	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL1, count++);
+	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL2, count++);
+	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL3, count++);
+	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL4, count++);
+	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL5, count++);
+	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL6, count++);
+
 	return count;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-23 20:07 [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers Jonathan Cavitt
@ 2024-10-28 16:36 ` Dixit, Ashutosh
  2024-10-28 20:38   ` Umesh Nerlige Ramappa
  2024-10-28 22:49 ` Dixit, Ashutosh
  2024-10-29 16:23 ` Lucas De Marchi
  2 siblings, 1 reply; 14+ messages in thread
From: Dixit, Ashutosh @ 2024-10-28 16:36 UTC (permalink / raw)
  To: Jonathan Cavitt
  Cc: intel-xe, saurabhg.gupta, alex.zuo, umesh.nerlige.ramappa,
	john.c.harrison, stable

On Wed, 23 Oct 2024 13:07:15 -0700, Jonathan Cavitt wrote:
>

Hi Umesh,

> @@ -748,6 +754,14 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
>		}
>	}
>
> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL0, count++);
> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL1, count++);
> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL2, count++);
> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL3, count++);
> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL4, count++);
> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL5, count++);
> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL6, count++);

I am trying to understand how this works. So these registers are
saved/restored by GuC because they are not part of HW context image and
that is why GuC needs to do the save/restore? Bspec 46458/56839 do seem to
be saying that these registers are context saved/restored? If that is
indeed true (though not sure), do they need to be here?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-28 16:36 ` Dixit, Ashutosh
@ 2024-10-28 20:38   ` Umesh Nerlige Ramappa
  2024-10-28 20:48     ` Dixit, Ashutosh
  0 siblings, 1 reply; 14+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-10-28 20:38 UTC (permalink / raw)
  To: Dixit, Ashutosh
  Cc: Jonathan Cavitt, intel-xe, saurabhg.gupta, alex.zuo,
	john.c.harrison, stable

On Mon, Oct 28, 2024 at 09:36:32AM -0700, Dixit, Ashutosh wrote:
>On Wed, 23 Oct 2024 13:07:15 -0700, Jonathan Cavitt wrote:
>>
>
>Hi Umesh,
>
>> @@ -748,6 +754,14 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
>>		}
>>	}
>>
>> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL0, count++);
>> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL1, count++);
>> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL2, count++);
>> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL3, count++);
>> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL4, count++);
>> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL5, count++);
>> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL6, count++);
>
>I am trying to understand how this works. So these registers are
>saved/restored by GuC because they are not part of HW context image

correct.

>and that is why GuC needs to do the save/restore?

yes, only if GuC performs an engine reset

>Bspec 46458/56839 do seem to
>be saying that these registers are context saved/restored? If that is
>indeed true (though not sure), do they need to be here?

For pre-gen12 they were part of the engine context image, but not from 
gen12 onwards. From gen12, they are in the power context image.

These were added because users were seeing the EuStall and EuActive 
counters zeroed out during OA use case. GuC was doing an engine reset 
for some reason and that was resetting these registers. Once we added it 
here (so GuC would save restore these), the counters had correct values.

Regards,
Umesh

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-28 20:38   ` Umesh Nerlige Ramappa
@ 2024-10-28 20:48     ` Dixit, Ashutosh
  0 siblings, 0 replies; 14+ messages in thread
From: Dixit, Ashutosh @ 2024-10-28 20:48 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa
  Cc: Jonathan Cavitt, intel-xe, saurabhg.gupta, alex.zuo,
	john.c.harrison, stable

On Mon, 28 Oct 2024 13:38:29 -0700, Umesh Nerlige Ramappa wrote:
>
> On Mon, Oct 28, 2024 at 09:36:32AM -0700, Dixit, Ashutosh wrote:
> > On Wed, 23 Oct 2024 13:07:15 -0700, Jonathan Cavitt wrote:
> >>
> >
> > Hi Umesh,
> >
> >> @@ -748,6 +754,14 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> >>		}
> >>	}
> >>
> >> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL0, count++);
> >> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL1, count++);
> >> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL2, count++);
> >> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL3, count++);
> >> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL4, count++);
> >> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL5, count++);
> >> +	guc_mmio_regset_write_one(ads, regset_map, EU_PERF_CNTL6, count++);
> >
> > I am trying to understand how this works. So these registers are
> > saved/restored by GuC because they are not part of HW context image
>
> correct.
>
> > and that is why GuC needs to do the save/restore?
>
> yes, only if GuC performs an engine reset
>
> > Bspec 46458/56839 do seem to
> > be saying that these registers are context saved/restored? If that is
> > indeed true (though not sure), do they need to be here?
>
> For pre-gen12 they were part of the engine context image, but not from
> gen12 onwards. From gen12, they are in the power context image.
>
> These were added because users were seeing the EuStall and EuActive
> counters zeroed out during OA use case. GuC was doing an engine reset for
> some reason and that was resetting these registers. Once we added it here
> (so GuC would save restore these), the counters had correct values.

Hi Umesh, thanks for the explanation, yes let's just leave these here.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-23 20:07 [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers Jonathan Cavitt
  2024-10-28 16:36 ` Dixit, Ashutosh
@ 2024-10-28 22:49 ` Dixit, Ashutosh
  2024-10-29 16:23 ` Lucas De Marchi
  2 siblings, 0 replies; 14+ messages in thread
From: Dixit, Ashutosh @ 2024-10-28 22:49 UTC (permalink / raw)
  To: Jonathan Cavitt
  Cc: intel-xe, saurabhg.gupta, alex.zuo, umesh.nerlige.ramappa,
	john.c.harrison, stable

On Wed, 23 Oct 2024 13:07:15 -0700, Jonathan Cavitt wrote:
>
> Several OA registers and allowlist registers were missing from the
> save/restore list for GuC and could be lost during an engine reset.  Add
> them to the list.
>
> v2:
> - Fix commit message (Umesh)
> - Add missing closes (Ashutosh)
>
> v3:
> - Add missing fixes (Ashutosh)
>
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Suggested-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> CC: stable@vger.kernel.org # v6.11+
> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks for chasing this and the patch, merged to drm-xe-next.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-23 20:07 [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers Jonathan Cavitt
  2024-10-28 16:36 ` Dixit, Ashutosh
  2024-10-28 22:49 ` Dixit, Ashutosh
@ 2024-10-29 16:23 ` Lucas De Marchi
  2024-10-29 17:15   ` Dixit, Ashutosh
  2 siblings, 1 reply; 14+ messages in thread
From: Lucas De Marchi @ 2024-10-29 16:23 UTC (permalink / raw)
  To: Jonathan Cavitt
  Cc: intel-xe, saurabhg.gupta, alex.zuo, umesh.nerlige.ramappa,
	john.c.harrison, stable

On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
>Several OA registers and allowlist registers were missing from the
>save/restore list for GuC and could be lost during an engine reset.  Add
>them to the list.
>
>v2:
>- Fix commit message (Umesh)
>- Add missing closes (Ashutosh)
>
>v3:
>- Add missing fixes (Ashutosh)
>
>Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
>Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>Suggested-by: John Harrison <john.c.harrison@intel.com>
>Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>CC: stable@vger.kernel.org # v6.11+
>Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
>index 4e746ae98888..a196c4fb90fc 100644
>--- a/drivers/gpu/drm/xe/xe_guc_ads.c
>+++ b/drivers/gpu/drm/xe/xe_guc_ads.c
>@@ -15,6 +15,7 @@
> #include "regs/xe_engine_regs.h"
> #include "regs/xe_gt_regs.h"
> #include "regs/xe_guc_regs.h"
>+#include "regs/xe_oa_regs.h"
> #include "xe_bo.h"
> #include "xe_gt.h"
> #include "xe_gt_ccs_mode.h"
>@@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> 		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
> 	}
>
>+	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
>+		guc_mmio_regset_write_one(ads, regset_map,
>+					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
>+					  count++);

this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.

The loop just before these added lines should be sufficient to go over
all engine save/restore register and give them to guc.

Lucas De Marchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 16:23 ` Lucas De Marchi
@ 2024-10-29 17:15   ` Dixit, Ashutosh
  2024-10-29 17:32     ` Lucas De Marchi
  0 siblings, 1 reply; 14+ messages in thread
From: Dixit, Ashutosh @ 2024-10-29 17:15 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Jonathan Cavitt, intel-xe, saurabhg.gupta, alex.zuo,
	umesh.nerlige.ramappa, john.c.harrison, stable

On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
>
> On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
> > Several OA registers and allowlist registers were missing from the
> > save/restore list for GuC and could be lost during an engine reset.  Add
> > them to the list.
> >
> > v2:
> > - Fix commit message (Umesh)
> > - Add missing closes (Ashutosh)
> >
> > v3:
> > - Add missing fixes (Ashutosh)
> >
> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
> > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > Suggested-by: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > CC: stable@vger.kernel.org # v6.11+
> > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> > index 4e746ae98888..a196c4fb90fc 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> > @@ -15,6 +15,7 @@
> > #include "regs/xe_engine_regs.h"
> > #include "regs/xe_gt_regs.h"
> > #include "regs/xe_guc_regs.h"
> > +#include "regs/xe_oa_regs.h"
> > #include "xe_bo.h"
> > #include "xe_gt.h"
> > #include "xe_gt_ccs_mode.h"
> > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
> >	}
> >
> > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
> > +		guc_mmio_regset_write_one(ads, regset_map,
> > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
> > +					  count++);
>
> this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.

Yikes, this got merged yesterday.

>
> The loop just before these added lines should be sufficient to go over
> all engine save/restore register and give them to guc.

You probably mean this one?

	xa_for_each(&hwe->reg_sr.xa, idx, entry)
		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);

But then how come this patch fixed GL #2249?

Ashutosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 17:15   ` Dixit, Ashutosh
@ 2024-10-29 17:32     ` Lucas De Marchi
  2024-10-29 19:33       ` Matt Roper
  2024-10-29 19:38       ` Dixit, Ashutosh
  0 siblings, 2 replies; 14+ messages in thread
From: Lucas De Marchi @ 2024-10-29 17:32 UTC (permalink / raw)
  To: Dixit, Ashutosh
  Cc: Jonathan Cavitt, intel-xe, saurabhg.gupta, alex.zuo,
	umesh.nerlige.ramappa, john.c.harrison, stable

On Tue, Oct 29, 2024 at 10:15:54AM -0700, Ashutosh Dixit wrote:
>On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
>>
>> On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
>> > Several OA registers and allowlist registers were missing from the
>> > save/restore list for GuC and could be lost during an engine reset.  Add
>> > them to the list.
>> >
>> > v2:
>> > - Fix commit message (Umesh)
>> > - Add missing closes (Ashutosh)
>> >
>> > v3:
>> > - Add missing fixes (Ashutosh)
>> >
>> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
>> > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>> > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > Suggested-by: John Harrison <john.c.harrison@intel.com>
>> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>> > CC: stable@vger.kernel.org # v6.11+
>> > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
>> > 1 file changed, 14 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > index 4e746ae98888..a196c4fb90fc 100644
>> > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
>> > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > @@ -15,6 +15,7 @@
>> > #include "regs/xe_engine_regs.h"
>> > #include "regs/xe_gt_regs.h"
>> > #include "regs/xe_guc_regs.h"
>> > +#include "regs/xe_oa_regs.h"
>> > #include "xe_bo.h"
>> > #include "xe_gt.h"
>> > #include "xe_gt_ccs_mode.h"
>> > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
>> >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
>> >	}
>> >
>> > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
>> > +		guc_mmio_regset_write_one(ads, regset_map,
>> > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
>> > +					  count++);
>>
>> this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.
>
>Yikes, this got merged yesterday.
>
>>
>> The loop just before these added lines should be sufficient to go over
>> all engine save/restore register and give them to guc.
>
>You probably mean this one?
>
>	xa_for_each(&hwe->reg_sr.xa, idx, entry)
>		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
>
>But then how come this patch fixed GL #2249?

it fixes, it just doesn't put it in the right place according to the
driver arch. Whitelists should be in that other file so it shows up in
debugfs, (/sys/kernel/debug/dri/*/*/register-save-restore), detect
clashes when we try to add the same register, etc.


Lucas De Marchi

>
>Ashutosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 17:32     ` Lucas De Marchi
@ 2024-10-29 19:33       ` Matt Roper
  2024-10-29 19:44         ` Dixit, Ashutosh
                           ` (2 more replies)
  2024-10-29 19:38       ` Dixit, Ashutosh
  1 sibling, 3 replies; 14+ messages in thread
From: Matt Roper @ 2024-10-29 19:33 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Dixit, Ashutosh, Jonathan Cavitt, intel-xe, saurabhg.gupta,
	alex.zuo, umesh.nerlige.ramappa, john.c.harrison, stable

On Tue, Oct 29, 2024 at 12:32:54PM -0500, Lucas De Marchi wrote:
> On Tue, Oct 29, 2024 at 10:15:54AM -0700, Ashutosh Dixit wrote:
> > On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
> > > 
> > > On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
> > > > Several OA registers and allowlist registers were missing from the
> > > > save/restore list for GuC and could be lost during an engine reset.  Add
> > > > them to the list.
> > > >
> > > > v2:
> > > > - Fix commit message (Umesh)
> > > > - Add missing closes (Ashutosh)
> > > >
> > > > v3:
> > > > - Add missing fixes (Ashutosh)
> > > >
> > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
> > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > > > Suggested-by: John Harrison <john.c.harrison@intel.com>
> > > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > > > CC: stable@vger.kernel.org # v6.11+
> > > > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > > > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
> > > > 1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> > > > index 4e746ae98888..a196c4fb90fc 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> > > > @@ -15,6 +15,7 @@
> > > > #include "regs/xe_engine_regs.h"
> > > > #include "regs/xe_gt_regs.h"
> > > > #include "regs/xe_guc_regs.h"
> > > > +#include "regs/xe_oa_regs.h"
> > > > #include "xe_bo.h"
> > > > #include "xe_gt.h"
> > > > #include "xe_gt_ccs_mode.h"
> > > > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> > > >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
> > > >	}
> > > >
> > > > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
> > > > +		guc_mmio_regset_write_one(ads, regset_map,
> > > > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
> > > > +					  count++);
> > > 
> > > this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.
> > 
> > Yikes, this got merged yesterday.
> > 
> > > 
> > > The loop just before these added lines should be sufficient to go over
> > > all engine save/restore register and give them to guc.
> > 
> > You probably mean this one?
> > 
> > 	xa_for_each(&hwe->reg_sr.xa, idx, entry)
> > 		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
> > 
> > But then how come this patch fixed GL #2249?
> 
> it fixes, it just doesn't put it in the right place according to the
> driver arch. Whitelists should be in that other file so it shows up in
> debugfs, (/sys/kernel/debug/dri/*/*/register-save-restore), detect
> clashes when we try to add the same register, etc.

Also, this patch failed pre-merge BAT since it added new regset entries
that we never actually allocated storage space for.  Now that it's been
applied, we're seeing CI failures on lots of tests from this:

https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295


Matt

> 
> 
> Lucas De Marchi
> 
> > 
> > Ashutosh

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 17:32     ` Lucas De Marchi
  2024-10-29 19:33       ` Matt Roper
@ 2024-10-29 19:38       ` Dixit, Ashutosh
  1 sibling, 0 replies; 14+ messages in thread
From: Dixit, Ashutosh @ 2024-10-29 19:38 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Jonathan Cavitt, intel-xe, saurabhg.gupta, alex.zuo,
	umesh.nerlige.ramappa, john.c.harrison, stable

On Tue, 29 Oct 2024 10:32:54 -0700, Lucas De Marchi wrote:
>

Hi Lucas,

> On Tue, Oct 29, 2024 at 10:15:54AM -0700, Ashutosh Dixit wrote:
> > On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
> >>
> >> On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
> >> > Several OA registers and allowlist registers were missing from the
> >> > save/restore list for GuC and could be lost during an engine reset.  Add
> >> > them to the list.
> >> >
> >> > v2:
> >> > - Fix commit message (Umesh)
> >> > - Add missing closes (Ashutosh)
> >> >
> >> > v3:
> >> > - Add missing fixes (Ashutosh)
> >> >
> >> > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
> >> > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> >> > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> > Suggested-by: John Harrison <john.c.harrison@intel.com>
> >> > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> >> > CC: stable@vger.kernel.org # v6.11+
> >> > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> >> > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> > ---
> >> > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
> >> > 1 file changed, 14 insertions(+)
> >> >
> >> > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> >> > index 4e746ae98888..a196c4fb90fc 100644
> >> > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> >> > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> >> > @@ -15,6 +15,7 @@
> >> > #include "regs/xe_engine_regs.h"
> >> > #include "regs/xe_gt_regs.h"
> >> > #include "regs/xe_guc_regs.h"
> >> > +#include "regs/xe_oa_regs.h"
> >> > #include "xe_bo.h"
> >> > #include "xe_gt.h"
> >> > #include "xe_gt_ccs_mode.h"
> >> > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> >> >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
> >> >	}
> >> >
> >> > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
> >> > +		guc_mmio_regset_write_one(ads, regset_map,
> >> > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
> >> > +					  count++);
> >>
> >> this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.
> >
> > Yikes, this got merged yesterday.
> >
> >>
> >> The loop just before these added lines should be sufficient to go over
> >> all engine save/restore register and give them to guc.
> >
> > You probably mean this one?
> >
> >	xa_for_each(&hwe->reg_sr.xa, idx, entry)
> >		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
> >
> > But then how come this patch fixed GL #2249?
>
> it fixes, it just doesn't put it in the right place according to the
> driver arch. Whitelists should be in that other file so it shows up in
> debugfs, (/sys/kernel/debug/dri/*/*/register-save-restore), detect
> clashes when we try to add the same register, etc.

Sorry, still not following. OA registers are in xe_reg_whitelist.c (see
entries for "oa_reg_render" and "oa_reg_compute" in that file). To
whiteliest registers, the registers need to be added to NONPRIV
registers. This loop mentioned above:

	xa_for_each(&hwe->reg_sr.xa, idx, entry)
		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);

seems to add the original OA registers to GuC save/restore list. But this
new code:

	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
		guc_mmio_regset_write_one(ads, regset_map,
					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
					  count++);

Now adds the NONPRIV registers to GuC save/restore list (which fixes GL
#2249). So not sure what is not in the right place, adding to GuC
save/restore list is right here where the code is added.

Also we don't want to whitelist NONPRIV registers, we only want to add them
to GuC save/restore list.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 19:33       ` Matt Roper
@ 2024-10-29 19:44         ` Dixit, Ashutosh
  2024-10-29 19:50           ` Lucas De Marchi
  2024-10-29 19:46         ` Lucas De Marchi
  2024-10-29 21:19         ` Umesh Nerlige Ramappa
  2 siblings, 1 reply; 14+ messages in thread
From: Dixit, Ashutosh @ 2024-10-29 19:44 UTC (permalink / raw)
  To: Matt Roper
  Cc: Lucas De Marchi, Jonathan Cavitt, intel-xe, saurabhg.gupta,
	alex.zuo, umesh.nerlige.ramappa, john.c.harrison, stable

On Tue, 29 Oct 2024 12:33:13 -0700, Matt Roper wrote:
>
> On Tue, Oct 29, 2024 at 12:32:54PM -0500, Lucas De Marchi wrote:
> > On Tue, Oct 29, 2024 at 10:15:54AM -0700, Ashutosh Dixit wrote:
> > > On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
> > > >
> > > > On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
> > > > > Several OA registers and allowlist registers were missing from the
> > > > > save/restore list for GuC and could be lost during an engine reset.  Add
> > > > > them to the list.
> > > > >
> > > > > v2:
> > > > > - Fix commit message (Umesh)
> > > > > - Add missing closes (Ashutosh)
> > > > >
> > > > > v3:
> > > > > - Add missing fixes (Ashutosh)
> > > > >
> > > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
> > > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> > > > > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > > > > Suggested-by: John Harrison <john.c.harrison@intel.com>
> > > > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > > > > CC: stable@vger.kernel.org # v6.11+
> > > > > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > > > > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
> > > > > 1 file changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
> > > > > index 4e746ae98888..a196c4fb90fc 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
> > > > > @@ -15,6 +15,7 @@
> > > > > #include "regs/xe_engine_regs.h"
> > > > > #include "regs/xe_gt_regs.h"
> > > > > #include "regs/xe_guc_regs.h"
> > > > > +#include "regs/xe_oa_regs.h"
> > > > > #include "xe_bo.h"
> > > > > #include "xe_gt.h"
> > > > > #include "xe_gt_ccs_mode.h"
> > > > > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
> > > > >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
> > > > >	}
> > > > >
> > > > > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
> > > > > +		guc_mmio_regset_write_one(ads, regset_map,
> > > > > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
> > > > > +					  count++);
> > > >
> > > > this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.
> > >
> > > Yikes, this got merged yesterday.
> > >
> > > >
> > > > The loop just before these added lines should be sufficient to go over
> > > > all engine save/restore register and give them to guc.
> > >
> > > You probably mean this one?
> > >
> > >	xa_for_each(&hwe->reg_sr.xa, idx, entry)
> > >		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
> > >
> > > But then how come this patch fixed GL #2249?
> >
> > it fixes, it just doesn't put it in the right place according to the
> > driver arch. Whitelists should be in that other file so it shows up in
> > debugfs, (/sys/kernel/debug/dri/*/*/register-save-restore), detect
> > clashes when we try to add the same register, etc.
>
> Also, this patch failed pre-merge BAT since it added new regset entries
> that we never actually allocated storage space for.  Now that it's been
> applied, we're seeing CI failures on lots of tests from this:
>
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295

Wow, truly sorry, completely missed that BAT failures were due to this
patch. How about we just revert this patch for now and redo it later?
Unless you or Lucas know how to fix this immediately (I don't).

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 19:33       ` Matt Roper
  2024-10-29 19:44         ` Dixit, Ashutosh
@ 2024-10-29 19:46         ` Lucas De Marchi
  2024-10-29 21:19         ` Umesh Nerlige Ramappa
  2 siblings, 0 replies; 14+ messages in thread
From: Lucas De Marchi @ 2024-10-29 19:46 UTC (permalink / raw)
  To: Matt Roper
  Cc: Dixit, Ashutosh, Jonathan Cavitt, intel-xe, saurabhg.gupta,
	alex.zuo, umesh.nerlige.ramappa, john.c.harrison, stable

On Tue, Oct 29, 2024 at 12:33:13PM -0700, Matt Roper wrote:
>On Tue, Oct 29, 2024 at 12:32:54PM -0500, Lucas De Marchi wrote:
>> On Tue, Oct 29, 2024 at 10:15:54AM -0700, Ashutosh Dixit wrote:
>> > On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
>> > >
>> > > On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
>> > > > Several OA registers and allowlist registers were missing from the
>> > > > save/restore list for GuC and could be lost during an engine reset.  Add
>> > > > them to the list.
>> > > >
>> > > > v2:
>> > > > - Fix commit message (Umesh)
>> > > > - Add missing closes (Ashutosh)
>> > > >
>> > > > v3:
>> > > > - Add missing fixes (Ashutosh)
>> > > >
>> > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
>> > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>> > > > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > > > Suggested-by: John Harrison <john.c.harrison@intel.com>
>> > > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>> > > > CC: stable@vger.kernel.org # v6.11+
>> > > > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> > > > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > > > ---
>> > > > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
>> > > > 1 file changed, 14 insertions(+)
>> > > >
>> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > index 4e746ae98888..a196c4fb90fc 100644
>> > > > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > @@ -15,6 +15,7 @@
>> > > > #include "regs/xe_engine_regs.h"
>> > > > #include "regs/xe_gt_regs.h"
>> > > > #include "regs/xe_guc_regs.h"
>> > > > +#include "regs/xe_oa_regs.h"
>> > > > #include "xe_bo.h"
>> > > > #include "xe_gt.h"
>> > > > #include "xe_gt_ccs_mode.h"
>> > > > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
>> > > >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
>> > > >	}
>> > > >
>> > > > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
>> > > > +		guc_mmio_regset_write_one(ads, regset_map,
>> > > > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
>> > > > +					  count++);
>> > >
>> > > this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.
>> >
>> > Yikes, this got merged yesterday.
>> >
>> > >
>> > > The loop just before these added lines should be sufficient to go over
>> > > all engine save/restore register and give them to guc.
>> >
>> > You probably mean this one?
>> >
>> > 	xa_for_each(&hwe->reg_sr.xa, idx, entry)
>> > 		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
>> >
>> > But then how come this patch fixed GL #2249?
>>
>> it fixes, it just doesn't put it in the right place according to the
>> driver arch. Whitelists should be in that other file so it shows up in
>> debugfs, (/sys/kernel/debug/dri/*/*/register-save-restore), detect
>> clashes when we try to add the same register, etc.
>
>Also, this patch failed pre-merge BAT since it added new regset entries
>that we never actually allocated storage space for.  Now that it's been
>applied, we're seeing CI failures on lots of tests from this:
>
>https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295

right... it missed updating the function calculate_regset_size()
to account for these additional registers.

Lucas De Marchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 19:44         ` Dixit, Ashutosh
@ 2024-10-29 19:50           ` Lucas De Marchi
  0 siblings, 0 replies; 14+ messages in thread
From: Lucas De Marchi @ 2024-10-29 19:50 UTC (permalink / raw)
  To: Dixit, Ashutosh
  Cc: Matt Roper, Jonathan Cavitt, intel-xe, saurabhg.gupta, alex.zuo,
	umesh.nerlige.ramappa, john.c.harrison, stable

On Tue, Oct 29, 2024 at 12:44:02PM -0700, Ashutosh Dixit wrote:
>On Tue, 29 Oct 2024 12:33:13 -0700, Matt Roper wrote:
>>
>> On Tue, Oct 29, 2024 at 12:32:54PM -0500, Lucas De Marchi wrote:
>> > On Tue, Oct 29, 2024 at 10:15:54AM -0700, Ashutosh Dixit wrote:
>> > > On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
>> > > >
>> > > > On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
>> > > > > Several OA registers and allowlist registers were missing from the
>> > > > > save/restore list for GuC and could be lost during an engine reset.  Add
>> > > > > them to the list.
>> > > > >
>> > > > > v2:
>> > > > > - Fix commit message (Umesh)
>> > > > > - Add missing closes (Ashutosh)
>> > > > >
>> > > > > v3:
>> > > > > - Add missing fixes (Ashutosh)
>> > > > >
>> > > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
>> > > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>> > > > > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > > > > Suggested-by: John Harrison <john.c.harrison@intel.com>
>> > > > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>> > > > > CC: stable@vger.kernel.org # v6.11+
>> > > > > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> > > > > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > > > > ---
>> > > > > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
>> > > > > 1 file changed, 14 insertions(+)
>> > > > >
>> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > > index 4e746ae98888..a196c4fb90fc 100644
>> > > > > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > > @@ -15,6 +15,7 @@
>> > > > > #include "regs/xe_engine_regs.h"
>> > > > > #include "regs/xe_gt_regs.h"
>> > > > > #include "regs/xe_guc_regs.h"
>> > > > > +#include "regs/xe_oa_regs.h"
>> > > > > #include "xe_bo.h"
>> > > > > #include "xe_gt.h"
>> > > > > #include "xe_gt_ccs_mode.h"
>> > > > > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
>> > > > >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
>> > > > >	}
>> > > > >
>> > > > > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
>> > > > > +		guc_mmio_regset_write_one(ads, regset_map,
>> > > > > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
>> > > > > +					  count++);
>> > > >
>> > > > this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.
>> > >
>> > > Yikes, this got merged yesterday.
>> > >
>> > > >
>> > > > The loop just before these added lines should be sufficient to go over
>> > > > all engine save/restore register and give them to guc.
>> > >
>> > > You probably mean this one?
>> > >
>> > >	xa_for_each(&hwe->reg_sr.xa, idx, entry)
>> > >		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
>> > >
>> > > But then how come this patch fixed GL #2249?
>> >
>> > it fixes, it just doesn't put it in the right place according to the
>> > driver arch. Whitelists should be in that other file so it shows up in
>> > debugfs, (/sys/kernel/debug/dri/*/*/register-save-restore), detect
>> > clashes when we try to add the same register, etc.
>>
>> Also, this patch failed pre-merge BAT since it added new regset entries
>> that we never actually allocated storage space for.  Now that it's been
>> applied, we're seeing CI failures on lots of tests from this:
>>
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295
>
>Wow, truly sorry, completely missed that BAT failures were due to this
>patch. How about we just revert this patch for now and redo it later?
>Unless you or Lucas know how to fix this immediately (I don't).

the fix is easy: update calculate_regset_size(). But I don't like
polluting xe_guc_ads.c. If the register was part of reg_sr.xa you
wouldn't need that since the loop already counts the registers.

I'm ok with reverting it.


Lucas De Marchi

>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers
  2024-10-29 19:33       ` Matt Roper
  2024-10-29 19:44         ` Dixit, Ashutosh
  2024-10-29 19:46         ` Lucas De Marchi
@ 2024-10-29 21:19         ` Umesh Nerlige Ramappa
  2 siblings, 0 replies; 14+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-10-29 21:19 UTC (permalink / raw)
  To: Matt Roper
  Cc: Lucas De Marchi, Dixit, Ashutosh, Jonathan Cavitt, intel-xe,
	saurabhg.gupta, alex.zuo, john.c.harrison, stable

On Tue, Oct 29, 2024 at 12:33:13PM -0700, Matt Roper wrote:
>On Tue, Oct 29, 2024 at 12:32:54PM -0500, Lucas De Marchi wrote:
>> On Tue, Oct 29, 2024 at 10:15:54AM -0700, Ashutosh Dixit wrote:
>> > On Tue, 29 Oct 2024 09:23:49 -0700, Lucas De Marchi wrote:
>> > >
>> > > On Wed, Oct 23, 2024 at 08:07:15PM +0000, Jonathan Cavitt wrote:
>> > > > Several OA registers and allowlist registers were missing from the
>> > > > save/restore list for GuC and could be lost during an engine reset.  Add
>> > > > them to the list.
>> > > >
>> > > > v2:
>> > > > - Fix commit message (Umesh)
>> > > > - Add missing closes (Ashutosh)
>> > > >
>> > > > v3:
>> > > > - Add missing fixes (Ashutosh)
>> > > >
>> > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
>> > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>> > > > Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > > > Suggested-by: John Harrison <john.c.harrison@intel.com>
>> > > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>> > > > CC: stable@vger.kernel.org # v6.11+
>> > > > Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> > > > Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > > > ---
>> > > > drivers/gpu/drm/xe/xe_guc_ads.c | 14 ++++++++++++++
>> > > > 1 file changed, 14 insertions(+)
>> > > >
>> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > index 4e746ae98888..a196c4fb90fc 100644
>> > > > --- a/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > +++ b/drivers/gpu/drm/xe/xe_guc_ads.c
>> > > > @@ -15,6 +15,7 @@
>> > > > #include "regs/xe_engine_regs.h"
>> > > > #include "regs/xe_gt_regs.h"
>> > > > #include "regs/xe_guc_regs.h"
>> > > > +#include "regs/xe_oa_regs.h"
>> > > > #include "xe_bo.h"
>> > > > #include "xe_gt.h"
>> > > > #include "xe_gt_ccs_mode.h"
>> > > > @@ -740,6 +741,11 @@ static unsigned int guc_mmio_regset_write(struct xe_guc_ads *ads,
>> > > >		guc_mmio_regset_write_one(ads, regset_map, e->reg, count++);
>> > > >	}
>> > > >
>> > > > +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
>> > > > +		guc_mmio_regset_write_one(ads, regset_map,
>> > > > +					  RING_FORCE_TO_NONPRIV(hwe->mmio_base, i),
>> > > > +					  count++);
>> > >
>> > > this is not the proper place. See drivers/gpu/drm/xe/xe_reg_whitelist.c.
>> >
>> > Yikes, this got merged yesterday.
>> >
>> > >
>> > > The loop just before these added lines should be sufficient to go over
>> > > all engine save/restore register and give them to guc.
>> >
>> > You probably mean this one?
>> >
>> > 	xa_for_each(&hwe->reg_sr.xa, idx, entry)
>> > 		guc_mmio_regset_write_one(ads, regset_map, entry->reg, count++);
>> >
>> > But then how come this patch fixed GL #2249?
>>
>> it fixes, it just doesn't put it in the right place according to the
>> driver arch. Whitelists should be in that other file so it shows up in
>> debugfs, (/sys/kernel/debug/dri/*/*/register-save-restore), detect
>> clashes when we try to add the same register, etc.
>
>Also, this patch failed pre-merge BAT since it added new regset entries
>that we never actually allocated storage space for.  Now that it's been
>applied, we're seeing CI failures on lots of tests from this:
>
>https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295
>

Sorry, didn't fully understand how this works in Xe KMD.

Does this mean that we should just add stuff into the 
register_whitelist[] array in xe_reg_whitelist.c OR should this be added 
to hwe->reg_sr using the xe_rtp_process_to_sr() interface? What's the 
difference between the 2 ways or when to use which one?

Thanks,
Umesh
>
>Matt
>
>>
>>
>> Lucas De Marchi
>>
>> >
>> > Ashutosh
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-29 21:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 20:07 [PATCH v3] drm/xe/xe_guc_ads: save/restore OA registers Jonathan Cavitt
2024-10-28 16:36 ` Dixit, Ashutosh
2024-10-28 20:38   ` Umesh Nerlige Ramappa
2024-10-28 20:48     ` Dixit, Ashutosh
2024-10-28 22:49 ` Dixit, Ashutosh
2024-10-29 16:23 ` Lucas De Marchi
2024-10-29 17:15   ` Dixit, Ashutosh
2024-10-29 17:32     ` Lucas De Marchi
2024-10-29 19:33       ` Matt Roper
2024-10-29 19:44         ` Dixit, Ashutosh
2024-10-29 19:50           ` Lucas De Marchi
2024-10-29 19:46         ` Lucas De Marchi
2024-10-29 21:19         ` Umesh Nerlige Ramappa
2024-10-29 19:38       ` Dixit, Ashutosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).