Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] Small bar recovery vs compressed content on DG2
@ 2022-03-16  7:25 Thomas Hellström
  2022-03-17  8:43 ` Joonas Lahtinen
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Hellström @ 2022-03-16  7:25 UTC (permalink / raw)
  To: Matthew Auld, Joonas Lahtinen, Bloomfield, Jon,
	Intel Graphics Development, Ramalingam C

Hi!

Do we somehow need to clarify in the headers the semantics for this?

 From my understanding when discussing the CCS migration series with 
Ram, the kernel will never do any resolving (compressing / 
decompressing) migrations or evictions which basically implies the 
following:

*) Compressed data must have LMEM only placement, otherwise the GPU 
would read garbage if accessing from SMEM.
*) Compressed data can't be assumed to be mappable by the CPU, because 
in order to ensure that on small BAR, the placement needs to be LMEM+SMEM.
*) Neither can compressed data be part of a CAPTURE buffer, because that 
requires the data to be CPU-mappable.

Are we (and user-mode drivers) OK with these restrictions, or do we need 
to rethink?

Thanks,

Thomas



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-16  7:25 [Intel-gfx] Small bar recovery vs compressed content on DG2 Thomas Hellström
@ 2022-03-17  8:43 ` Joonas Lahtinen
  2022-03-17  9:29   ` Matthew Auld
  2022-03-17  9:35   ` Thomas Hellström
  0 siblings, 2 replies; 11+ messages in thread
From: Joonas Lahtinen @ 2022-03-17  8:43 UTC (permalink / raw)
  To: Bloomfield, Jon, Thomas Hellström,
	Intel Graphics Development, Matthew Auld, Ramalingam C

Quoting Thomas Hellström (2022-03-16 09:25:16)
> Hi!
> 
> Do we somehow need to clarify in the headers the semantics for this?
> 
>  From my understanding when discussing the CCS migration series with 
> Ram, the kernel will never do any resolving (compressing / 
> decompressing) migrations or evictions which basically implies the 
> following:
> 
> *) Compressed data must have LMEM only placement, otherwise the GPU 
> would read garbage if accessing from SMEM.

This has always been the case, so it should be documented in the uAPI
headers and kerneldocs.

> *) Compressed data can't be assumed to be mappable by the CPU, because 
> in order to ensure that on small BAR, the placement needs to be LMEM+SMEM.

Not strictly true, as we could always migrate to the mappable region in
the CPU fault handler. Will need the same set of tricks as with limited
mappable GGTT in past.

> *) Neither can compressed data be part of a CAPTURE buffer, because that 
> requires the data to be CPU-mappable.

Especially this will be too big of a limitation which we can't really afford
when it comes to debugging.

Regards, Joonas

> Are we (and user-mode drivers) OK with these restrictions, or do we need 
> to rethink?
> 
> Thanks,
> 
> Thomas
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-17  8:43 ` Joonas Lahtinen
@ 2022-03-17  9:29   ` Matthew Auld
  2022-03-17  9:35   ` Thomas Hellström
  1 sibling, 0 replies; 11+ messages in thread
From: Matthew Auld @ 2022-03-17  9:29 UTC (permalink / raw)
  To: Joonas Lahtinen, Bloomfield, Jon, Thomas Hellström,
	Intel Graphics Development, Ramalingam C

On 17/03/2022 08:43, Joonas Lahtinen wrote:
> Quoting Thomas Hellström (2022-03-16 09:25:16)
>> Hi!
>>
>> Do we somehow need to clarify in the headers the semantics for this?
>>
>>   From my understanding when discussing the CCS migration series with
>> Ram, the kernel will never do any resolving (compressing /
>> decompressing) migrations or evictions which basically implies the
>> following:
>>
>> *) Compressed data must have LMEM only placement, otherwise the GPU
>> would read garbage if accessing from SMEM.
> 
> This has always been the case, so it should be documented in the uAPI
> headers and kerneldocs.
> 
>> *) Compressed data can't be assumed to be mappable by the CPU, because
>> in order to ensure that on small BAR, the placement needs to be LMEM+SMEM.
> 
> Not strictly true, as we could always migrate to the mappable region in
> the CPU fault handler. Will need the same set of tricks as with limited
> mappable GGTT in past.

With the proposed I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS hint[1], it 
looks like by design we always require lmem + smem, with the idea that 
we can always spill to system memory if needed. So I guess explicit 
NEEDS_CPU_ACCESS + compression is not supported, is this the expected 
behaviour?

[1] https://patchwork.freedesktop.org/patch/475061/

> 
>> *) Neither can compressed data be part of a CAPTURE buffer, because that
>> requires the data to be CPU-mappable.
> 
> Especially this will be too big of a limitation which we can't really afford
> when it comes to debugging.
> 
> Regards, Joonas
> 
>> Are we (and user-mode drivers) OK with these restrictions, or do we need
>> to rethink?
>>
>> Thanks,
>>
>> Thomas
>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-17  8:43 ` Joonas Lahtinen
  2022-03-17  9:29   ` Matthew Auld
@ 2022-03-17  9:35   ` Thomas Hellström
  2022-03-17 18:21     ` Bloomfield, Jon
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Hellström @ 2022-03-17  9:35 UTC (permalink / raw)
  To: Joonas Lahtinen, Bloomfield, Jon, Intel Graphics Development,
	Matthew Auld, Ramalingam C

On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
> Quoting Thomas Hellström (2022-03-16 09:25:16)
> > Hi!
> > 
> > Do we somehow need to clarify in the headers the semantics for
> > this?
> > 
> >  From my understanding when discussing the CCS migration series
> > with 
> > Ram, the kernel will never do any resolving (compressing / 
> > decompressing) migrations or evictions which basically implies the 
> > following:
> > 
> > *) Compressed data must have LMEM only placement, otherwise the GPU
> > would read garbage if accessing from SMEM.
> 
> This has always been the case, so it should be documented in the uAPI
> headers and kerneldocs.
> 
> > *) Compressed data can't be assumed to be mappable by the CPU,
> > because 
> > in order to ensure that on small BAR, the placement needs to be
> > LMEM+SMEM.
> 
> Not strictly true, as we could always migrate to the mappable region
> in
> the CPU fault handler. Will need the same set of tricks as with
> limited
> mappable GGTT in past.

In addition to Matt's reply:

Yes, if there is sufficient space. I'm not sure we want to complicate
this to migrate only part of the buffer to mappable on a fault basis?
Otherwise this is likely to fail.

One option is to allow cpu-mapping from SYSTEM like TTM is doing for
evicted buffers, even if SYSTEM is not in the placement list, and then
migrate back to LMEM for gpu access.

But can user-space even interpret the compressed data when CPU-mapping?
without access to the CCS metadata?

> 
> > *) Neither can compressed data be part of a CAPTURE buffer, because
> > that 
> > requires the data to be CPU-mappable.
> 
> Especially this will be too big of a limitation which we can't really
> afford
> when it comes to debugging.

Same here WRT user-space interpretation. 

This will become especially tricky on small BAR, because either we need
to fit all compressed buffers in the mappable portion, or be able to
blit the contents of the capture buffers from within the fence
signalling critical section, which will require a lot of work I guess.

/Thomas


> 
> Regards, Joonas
> 
> > Are we (and user-mode drivers) OK with these restrictions, or do we
> > need 
> > to rethink?
> > 
> > Thanks,
> > 
> > Thomas
> > 
> > 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-17  9:35   ` Thomas Hellström
@ 2022-03-17 18:21     ` Bloomfield, Jon
  2022-03-18  9:48       ` Thomas Hellström
  0 siblings, 1 reply; 11+ messages in thread
From: Bloomfield, Jon @ 2022-03-17 18:21 UTC (permalink / raw)
  To: Thomas Hellström, Joonas Lahtinen,
	Intel Graphics Development, Auld, Matthew, C, Ramalingam,
	Vetter, Daniel

+@Vetter, Daniel

Let's not start re-inventing this on the fly again. That's how we got into trouble in the past. The SAS/Whitepaper does currently require the SMEM+LMEM placement for mappable, for good reasons.

We cannot 'always migrate to mappable in the fault handler'. Or at least, this is not as trivial as it is to write in a sentence due to the need to spill out other active objects, and all the usual challenges with context synchronization etc. It is possible, perhaps with a lot of care, but it is challenging to guarantee, easy to break, and not needed for 99.9% of software. We are trying to simplify our driver stack.

If we need a special mechanism for debug, we should devise a special mechanism, not throw out the general LMEM+SMEM requirement. Are there any identified first-class clients that require such access, or is it only debugging tools?

If only debug, then why can't the tool use a copy engine submission to access the data in place? Or perhaps a bespoke ioctl to access this via the KMD (and kmd submitted copy-engine BB)?

Thanks,

Jon

> -----Original Message-----
> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Sent: Thursday, March 17, 2022 2:35 AM
> To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Bloomfield, Jon
> <jon.bloomfield@intel.com>; Intel Graphics Development <intel-
> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>; C,
> Ramalingam <ramalingam.c@intel.com>
> Subject: Re: Small bar recovery vs compressed content on DG2
> 
> On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
> > Quoting Thomas Hellström (2022-03-16 09:25:16)
> > > Hi!
> > >
> > > Do we somehow need to clarify in the headers the semantics for
> > > this?
> > >
> > >  From my understanding when discussing the CCS migration series
> > > with
> > > Ram, the kernel will never do any resolving (compressing /
> > > decompressing) migrations or evictions which basically implies the
> > > following:
> > >
> > > *) Compressed data must have LMEM only placement, otherwise the
> GPU
> > > would read garbage if accessing from SMEM.
> >
> > This has always been the case, so it should be documented in the uAPI
> > headers and kerneldocs.
> >
> > > *) Compressed data can't be assumed to be mappable by the CPU,
> > > because
> > > in order to ensure that on small BAR, the placement needs to be
> > > LMEM+SMEM.
> >
> > Not strictly true, as we could always migrate to the mappable region
> > in
> > the CPU fault handler. Will need the same set of tricks as with
> > limited
> > mappable GGTT in past.
> 
> In addition to Matt's reply:
> 
> Yes, if there is sufficient space. I'm not sure we want to complicate
> this to migrate only part of the buffer to mappable on a fault basis?
> Otherwise this is likely to fail.
> 
> One option is to allow cpu-mapping from SYSTEM like TTM is doing for
> evicted buffers, even if SYSTEM is not in the placement list, and then
> migrate back to LMEM for gpu access.
> 
> But can user-space even interpret the compressed data when CPU-
> mapping?
> without access to the CCS metadata?
> 
> >
> > > *) Neither can compressed data be part of a CAPTURE buffer, because
> > > that
> > > requires the data to be CPU-mappable.
> >
> > Especially this will be too big of a limitation which we can't really
> > afford
> > when it comes to debugging.
> 
> Same here WRT user-space interpretation.
> 
> This will become especially tricky on small BAR, because either we need
> to fit all compressed buffers in the mappable portion, or be able to
> blit the contents of the capture buffers from within the fence
> signalling critical section, which will require a lot of work I guess.
> 
> /Thomas
> 
> 
> >
> > Regards, Joonas
> >
> > > Are we (and user-mode drivers) OK with these restrictions, or do we
> > > need
> > > to rethink?
> > >
> > > Thanks,
> > >
> > > Thomas
> > >
> > >
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-17 18:21     ` Bloomfield, Jon
@ 2022-03-18  9:48       ` Thomas Hellström
  2022-03-18 16:25         ` Bloomfield, Jon
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Hellström @ 2022-03-18  9:48 UTC (permalink / raw)
  To: Bloomfield, Jon, Joonas Lahtinen, Intel Graphics Development,
	Auld, Matthew, C, Ramalingam, Vetter, Daniel

Hi,

On Thu, 2022-03-17 at 18:21 +0000, Bloomfield, Jon wrote:
> +@Vetter, Daniel
> 
> Let's not start re-inventing this on the fly again. That's how we got
> into trouble in the past. The SAS/Whitepaper does currently require
> the SMEM+LMEM placement for mappable, for good reasons.

Just to avoid any misunderstandings here:

We have two hard requirements from Arch that clash, main problem is
compressed bos can't be captured on error with current designs.

From an engineering point of view we can do little more than list
options available to resolve this and whether they are hard or not so
hard to implemement. But IMHO Arch needs to agree on what's got to
give.

Thanks,
Thomas


> 
> We cannot 'always migrate to mappable in the fault handler'. Or at
> least, this is not as trivial as it is to write in a sentence due to
> the need to spill out other active objects, and all the usual
> challenges with context synchronization etc. It is possible, perhaps
> with a lot of care, but it is challenging to guarantee, easy to
> break, and not needed for 99.9% of software. We are trying to
> simplify our driver stack.
> 
> If we need a special mechanism for debug, we should devise a special
> mechanism, not throw out the general LMEM+SMEM requirement. Are there
> any identified first-class clients that require such access, or is it
> only debugging tools?
> 
> If only debug, then why can't the tool use a copy engine submission
> to access the data in place? Or perhaps a bespoke ioctl to access
> this via the KMD (and kmd submitted copy-engine BB)?
> 
> Thanks,
> 
> Jon
> 
> > -----Original Message-----
> > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Sent: Thursday, March 17, 2022 2:35 AM
> > To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Bloomfield,
> > Jon
> > <jon.bloomfield@intel.com>; Intel Graphics Development <intel-
> > gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>;
> > C,
> > Ramalingam <ramalingam.c@intel.com>
> > Subject: Re: Small bar recovery vs compressed content on DG2
> > 
> > On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
> > > Quoting Thomas Hellström (2022-03-16 09:25:16)
> > > > Hi!
> > > > 
> > > > Do we somehow need to clarify in the headers the semantics for
> > > > this?
> > > > 
> > > >  From my understanding when discussing the CCS migration series
> > > > with
> > > > Ram, the kernel will never do any resolving (compressing /
> > > > decompressing) migrations or evictions which basically implies
> > > > the
> > > > following:
> > > > 
> > > > *) Compressed data must have LMEM only placement, otherwise the
> > GPU
> > > > would read garbage if accessing from SMEM.
> > > 
> > > This has always been the case, so it should be documented in the
> > > uAPI
> > > headers and kerneldocs.
> > > 
> > > > *) Compressed data can't be assumed to be mappable by the CPU,
> > > > because
> > > > in order to ensure that on small BAR, the placement needs to be
> > > > LMEM+SMEM.
> > > 
> > > Not strictly true, as we could always migrate to the mappable
> > > region
> > > in
> > > the CPU fault handler. Will need the same set of tricks as with
> > > limited
> > > mappable GGTT in past.
> > 
> > In addition to Matt's reply:
> > 
> > Yes, if there is sufficient space. I'm not sure we want to
> > complicate
> > this to migrate only part of the buffer to mappable on a fault
> > basis?
> > Otherwise this is likely to fail.
> > 
> > One option is to allow cpu-mapping from SYSTEM like TTM is doing
> > for
> > evicted buffers, even if SYSTEM is not in the placement list, and
> > then
> > migrate back to LMEM for gpu access.
> > 
> > But can user-space even interpret the compressed data when CPU-
> > mapping?
> > without access to the CCS metadata?
> > 
> > > 
> > > > *) Neither can compressed data be part of a CAPTURE buffer,
> > > > because
> > > > that
> > > > requires the data to be CPU-mappable.
> > > 
> > > Especially this will be too big of a limitation which we can't
> > > really
> > > afford
> > > when it comes to debugging.
> > 
> > Same here WRT user-space interpretation.
> > 
> > This will become especially tricky on small BAR, because either we
> > need
> > to fit all compressed buffers in the mappable portion, or be able
> > to
> > blit the contents of the capture buffers from within the fence
> > signalling critical section, which will require a lot of work I
> > guess.
> > 
> > /Thomas
> > 
> > 
> > > 
> > > Regards, Joonas
> > > 
> > > > Are we (and user-mode drivers) OK with these restrictions, or
> > > > do we
> > > > need
> > > > to rethink?
> > > > 
> > > > Thanks,
> > > > 
> > > > Thomas
> > > > 
> > > > 
> > 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-18  9:48       ` Thomas Hellström
@ 2022-03-18 16:25         ` Bloomfield, Jon
  2022-03-18 18:12           ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Bloomfield, Jon @ 2022-03-18 16:25 UTC (permalink / raw)
  To: Thomas Hellström, Joonas Lahtinen,
	Intel Graphics Development, Auld, Matthew, C, Ramalingam,
	Vetter, Daniel

@Thomas Hellström - I agree :-)

My question was really to @Joonas Lahtinen, who was saying we could always migrate in the CPU fault handler. I am pushing back on that unless we have no choice. It's the very complication we were trying to avoid with the current SAS. If that's what's needed, then so be it. But I'm asking whether we can instead handle this specially, instead of adding generic complexity to the primary code paths.

Jon

> -----Original Message-----
> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Sent: Friday, March 18, 2022 2:48 AM
> To: Bloomfield, Jon <jon.bloomfield@intel.com>; Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com>; Intel Graphics Development <intel-
> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>; C,
> Ramalingam <ramalingam.c@intel.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: Small bar recovery vs compressed content on DG2
> 
> Hi,
> 
> On Thu, 2022-03-17 at 18:21 +0000, Bloomfield, Jon wrote:
> > +@Vetter, Daniel
> >
> > Let's not start re-inventing this on the fly again. That's how we got
> > into trouble in the past. The SAS/Whitepaper does currently require
> > the SMEM+LMEM placement for mappable, for good reasons.
> 
> Just to avoid any misunderstandings here:
> 
> We have two hard requirements from Arch that clash, main problem is
> compressed bos can't be captured on error with current designs.
> 
> From an engineering point of view we can do little more than list
> options available to resolve this and whether they are hard or not so
> hard to implemement. But IMHO Arch needs to agree on what's got to
> give.
> 
> Thanks,
> Thomas
> 
> 
> >
> > We cannot 'always migrate to mappable in the fault handler'. Or at
> > least, this is not as trivial as it is to write in a sentence due to
> > the need to spill out other active objects, and all the usual
> > challenges with context synchronization etc. It is possible, perhaps
> > with a lot of care, but it is challenging to guarantee, easy to
> > break, and not needed for 99.9% of software. We are trying to
> > simplify our driver stack.
> >
> > If we need a special mechanism for debug, we should devise a special
> > mechanism, not throw out the general LMEM+SMEM requirement. Are
> there
> > any identified first-class clients that require such access, or is it
> > only debugging tools?
> >
> > If only debug, then why can't the tool use a copy engine submission
> > to access the data in place? Or perhaps a bespoke ioctl to access
> > this via the KMD (and kmd submitted copy-engine BB)?
> >
> > Thanks,
> >
> > Jon
> >
> > > -----Original Message-----
> > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Sent: Thursday, March 17, 2022 2:35 AM
> > > To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Bloomfield,
> > > Jon
> > > <jon.bloomfield@intel.com>; Intel Graphics Development <intel-
> > > gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>;
> > > C,
> > > Ramalingam <ramalingam.c@intel.com>
> > > Subject: Re: Small bar recovery vs compressed content on DG2
> > >
> > > On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
> > > > Quoting Thomas Hellström (2022-03-16 09:25:16)
> > > > > Hi!
> > > > >
> > > > > Do we somehow need to clarify in the headers the semantics for
> > > > > this?
> > > > >
> > > > >  From my understanding when discussing the CCS migration series
> > > > > with
> > > > > Ram, the kernel will never do any resolving (compressing /
> > > > > decompressing) migrations or evictions which basically implies
> > > > > the
> > > > > following:
> > > > >
> > > > > *) Compressed data must have LMEM only placement, otherwise the
> > > GPU
> > > > > would read garbage if accessing from SMEM.
> > > >
> > > > This has always been the case, so it should be documented in the
> > > > uAPI
> > > > headers and kerneldocs.
> > > >
> > > > > *) Compressed data can't be assumed to be mappable by the CPU,
> > > > > because
> > > > > in order to ensure that on small BAR, the placement needs to be
> > > > > LMEM+SMEM.
> > > >
> > > > Not strictly true, as we could always migrate to the mappable
> > > > region
> > > > in
> > > > the CPU fault handler. Will need the same set of tricks as with
> > > > limited
> > > > mappable GGTT in past.
> > >
> > > In addition to Matt's reply:
> > >
> > > Yes, if there is sufficient space. I'm not sure we want to
> > > complicate
> > > this to migrate only part of the buffer to mappable on a fault
> > > basis?
> > > Otherwise this is likely to fail.
> > >
> > > One option is to allow cpu-mapping from SYSTEM like TTM is doing
> > > for
> > > evicted buffers, even if SYSTEM is not in the placement list, and
> > > then
> > > migrate back to LMEM for gpu access.
> > >
> > > But can user-space even interpret the compressed data when CPU-
> > > mapping?
> > > without access to the CCS metadata?
> > >
> > > >
> > > > > *) Neither can compressed data be part of a CAPTURE buffer,
> > > > > because
> > > > > that
> > > > > requires the data to be CPU-mappable.
> > > >
> > > > Especially this will be too big of a limitation which we can't
> > > > really
> > > > afford
> > > > when it comes to debugging.
> > >
> > > Same here WRT user-space interpretation.
> > >
> > > This will become especially tricky on small BAR, because either we
> > > need
> > > to fit all compressed buffers in the mappable portion, or be able
> > > to
> > > blit the contents of the capture buffers from within the fence
> > > signalling critical section, which will require a lot of work I
> > > guess.
> > >
> > > /Thomas
> > >
> > >
> > > >
> > > > Regards, Joonas
> > > >
> > > > > Are we (and user-mode drivers) OK with these restrictions, or
> > > > > do we
> > > > > need
> > > > > to rethink?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Thomas
> > > > >
> > > > >
> > >
> >
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-18 16:25         ` Bloomfield, Jon
@ 2022-03-18 18:12           ` Daniel Vetter
  2022-03-21  6:53             ` Thomas Hellström
  2022-03-31  9:25             ` Matthew Auld
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Vetter @ 2022-03-18 18:12 UTC (permalink / raw)
  To: Bloomfield, Jon, Kenneth W Graunke, Lionel Landwerlin
  Cc: Thomas Hellström, Intel Graphics Development, dri-devel,
	Auld, Matthew, Vetter, Daniel

Maybe also good to add dri-devel to these discussions.

I'm not sure where exactly we landed with dgpu error capture (maybe I
should check the code but it's really w/e here), but I think we can
also toss in "you need a non-recoverable context for error capture to
work on dgpu". Since that simplifies things even more. Maybe Thomas
forgot to add that to the list of restrictions.

Anyway on the "we can't capture lmem-only compressed buffers", I think
that's totally fine. Those are for render targets, and we don't
capture those. Adding Lionel and Ken to confirm.
-Daniel

On Fri, 18 Mar 2022 at 17:26, Bloomfield, Jon <jon.bloomfield@intel.com> wrote:
>
> @Thomas Hellström - I agree :-)
>
> My question was really to @Joonas Lahtinen, who was saying we could always migrate in the CPU fault handler. I am pushing back on that unless we have no choice. It's the very complication we were trying to avoid with the current SAS. If that's what's needed, then so be it. But I'm asking whether we can instead handle this specially, instead of adding generic complexity to the primary code paths.
>
> Jon
>
> > -----Original Message-----
> > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Sent: Friday, March 18, 2022 2:48 AM
> > To: Bloomfield, Jon <jon.bloomfield@intel.com>; Joonas Lahtinen
> > <joonas.lahtinen@linux.intel.com>; Intel Graphics Development <intel-
> > gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>; C,
> > Ramalingam <ramalingam.c@intel.com>; Vetter, Daniel
> > <daniel.vetter@intel.com>
> > Subject: Re: Small bar recovery vs compressed content on DG2
> >
> > Hi,
> >
> > On Thu, 2022-03-17 at 18:21 +0000, Bloomfield, Jon wrote:
> > > +@Vetter, Daniel
> > >
> > > Let's not start re-inventing this on the fly again. That's how we got
> > > into trouble in the past. The SAS/Whitepaper does currently require
> > > the SMEM+LMEM placement for mappable, for good reasons.
> >
> > Just to avoid any misunderstandings here:
> >
> > We have two hard requirements from Arch that clash, main problem is
> > compressed bos can't be captured on error with current designs.
> >
> > From an engineering point of view we can do little more than list
> > options available to resolve this and whether they are hard or not so
> > hard to implemement. But IMHO Arch needs to agree on what's got to
> > give.
> >
> > Thanks,
> > Thomas
> >
> >
> > >
> > > We cannot 'always migrate to mappable in the fault handler'. Or at
> > > least, this is not as trivial as it is to write in a sentence due to
> > > the need to spill out other active objects, and all the usual
> > > challenges with context synchronization etc. It is possible, perhaps
> > > with a lot of care, but it is challenging to guarantee, easy to
> > > break, and not needed for 99.9% of software. We are trying to
> > > simplify our driver stack.
> > >
> > > If we need a special mechanism for debug, we should devise a special
> > > mechanism, not throw out the general LMEM+SMEM requirement. Are
> > there
> > > any identified first-class clients that require such access, or is it
> > > only debugging tools?
> > >
> > > If only debug, then why can't the tool use a copy engine submission
> > > to access the data in place? Or perhaps a bespoke ioctl to access
> > > this via the KMD (and kmd submitted copy-engine BB)?
> > >
> > > Thanks,
> > >
> > > Jon
> > >
> > > > -----Original Message-----
> > > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Sent: Thursday, March 17, 2022 2:35 AM
> > > > To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Bloomfield,
> > > > Jon
> > > > <jon.bloomfield@intel.com>; Intel Graphics Development <intel-
> > > > gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>;
> > > > C,
> > > > Ramalingam <ramalingam.c@intel.com>
> > > > Subject: Re: Small bar recovery vs compressed content on DG2
> > > >
> > > > On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
> > > > > Quoting Thomas Hellström (2022-03-16 09:25:16)
> > > > > > Hi!
> > > > > >
> > > > > > Do we somehow need to clarify in the headers the semantics for
> > > > > > this?
> > > > > >
> > > > > >  From my understanding when discussing the CCS migration series
> > > > > > with
> > > > > > Ram, the kernel will never do any resolving (compressing /
> > > > > > decompressing) migrations or evictions which basically implies
> > > > > > the
> > > > > > following:
> > > > > >
> > > > > > *) Compressed data must have LMEM only placement, otherwise the
> > > > GPU
> > > > > > would read garbage if accessing from SMEM.
> > > > >
> > > > > This has always been the case, so it should be documented in the
> > > > > uAPI
> > > > > headers and kerneldocs.
> > > > >
> > > > > > *) Compressed data can't be assumed to be mappable by the CPU,
> > > > > > because
> > > > > > in order to ensure that on small BAR, the placement needs to be
> > > > > > LMEM+SMEM.
> > > > >
> > > > > Not strictly true, as we could always migrate to the mappable
> > > > > region
> > > > > in
> > > > > the CPU fault handler. Will need the same set of tricks as with
> > > > > limited
> > > > > mappable GGTT in past.
> > > >
> > > > In addition to Matt's reply:
> > > >
> > > > Yes, if there is sufficient space. I'm not sure we want to
> > > > complicate
> > > > this to migrate only part of the buffer to mappable on a fault
> > > > basis?
> > > > Otherwise this is likely to fail.
> > > >
> > > > One option is to allow cpu-mapping from SYSTEM like TTM is doing
> > > > for
> > > > evicted buffers, even if SYSTEM is not in the placement list, and
> > > > then
> > > > migrate back to LMEM for gpu access.
> > > >
> > > > But can user-space even interpret the compressed data when CPU-
> > > > mapping?
> > > > without access to the CCS metadata?
> > > >
> > > > >
> > > > > > *) Neither can compressed data be part of a CAPTURE buffer,
> > > > > > because
> > > > > > that
> > > > > > requires the data to be CPU-mappable.
> > > > >
> > > > > Especially this will be too big of a limitation which we can't
> > > > > really
> > > > > afford
> > > > > when it comes to debugging.
> > > >
> > > > Same here WRT user-space interpretation.
> > > >
> > > > This will become especially tricky on small BAR, because either we
> > > > need
> > > > to fit all compressed buffers in the mappable portion, or be able
> > > > to
> > > > blit the contents of the capture buffers from within the fence
> > > > signalling critical section, which will require a lot of work I
> > > > guess.
> > > >
> > > > /Thomas
> > > >
> > > >
> > > > >
> > > > > Regards, Joonas
> > > > >
> > > > > > Are we (and user-mode drivers) OK with these restrictions, or
> > > > > > do we
> > > > > > need
> > > > > > to rethink?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Thomas
> > > > > >
> > > > > >
> > > >
> > >
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-18 18:12           ` Daniel Vetter
@ 2022-03-21  6:53             ` Thomas Hellström
  2022-03-31  9:25             ` Matthew Auld
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Hellström @ 2022-03-21  6:53 UTC (permalink / raw)
  To: Daniel Vetter, Bloomfield, Jon, Kenneth W Graunke,
	Lionel Landwerlin
  Cc: Intel Graphics Development, dri-devel, Auld, Matthew,
	Vetter, Daniel


On 3/18/22 19:12, Daniel Vetter wrote:
> Maybe also good to add dri-devel to these discussions.
>
> I'm not sure where exactly we landed with dgpu error capture (maybe I
> should check the code but it's really w/e here), but I think we can
> also toss in "you need a non-recoverable context for error capture to
> work on dgpu".

Error capture now works even with multiple pipelined migrations simply 
by looking at the vma_resource for the metadata snapshot when the 
request was queued instead of the vma for metadata, so no additional 
restrictions. (Also fixed up the gfp mode and completely avoided the 
contiguous allocations for the page directories).

> Since that simplifies things even more. Maybe Thomas
> forgot to add that to the list of restrictions.
>
> Anyway on the "we can't capture lmem-only compressed buffers", I think
> that's totally fine. Those are for render targets, and we don't
> capture those. Adding Lionel and Ken to confirm.
OK.
> -Daniel
>
> On Fri, 18 Mar 2022 at 17:26, Bloomfield, Jon <jon.bloomfield@intel.com> wrote:
>> @Thomas Hellström - I agree :-)
>>
>> My question was really to @Joonas Lahtinen, who was saying we could always migrate in the CPU fault handler. I am pushing back on that unless we have no choice. It's the very complication we were trying to avoid with the current SAS. If that's what's needed, then so be it. But I'm asking whether we can instead handle this specially, instead of adding generic complexity to the primary code paths.
>>
>> Jon
>>
>>> -----Original Message-----
>>> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Sent: Friday, March 18, 2022 2:48 AM
>>> To: Bloomfield, Jon <jon.bloomfield@intel.com>; Joonas Lahtinen
>>> <joonas.lahtinen@linux.intel.com>; Intel Graphics Development <intel-
>>> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>; C,
>>> Ramalingam <ramalingam.c@intel.com>; Vetter, Daniel
>>> <daniel.vetter@intel.com>
>>> Subject: Re: Small bar recovery vs compressed content on DG2
>>>
>>> Hi,
>>>
>>> On Thu, 2022-03-17 at 18:21 +0000, Bloomfield, Jon wrote:
>>>> +@Vetter, Daniel
>>>>
>>>> Let's not start re-inventing this on the fly again. That's how we got
>>>> into trouble in the past. The SAS/Whitepaper does currently require
>>>> the SMEM+LMEM placement for mappable, for good reasons.
>>> Just to avoid any misunderstandings here:
>>>
>>> We have two hard requirements from Arch that clash, main problem is
>>> compressed bos can't be captured on error with current designs.
>>>
>>>  From an engineering point of view we can do little more than list
>>> options available to resolve this and whether they are hard or not so
>>> hard to implemement. But IMHO Arch needs to agree on what's got to
>>> give.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>>> We cannot 'always migrate to mappable in the fault handler'. Or at
>>>> least, this is not as trivial as it is to write in a sentence due to
>>>> the need to spill out other active objects, and all the usual
>>>> challenges with context synchronization etc. It is possible, perhaps
>>>> with a lot of care, but it is challenging to guarantee, easy to
>>>> break, and not needed for 99.9% of software. We are trying to
>>>> simplify our driver stack.
>>>>
>>>> If we need a special mechanism for debug, we should devise a special
>>>> mechanism, not throw out the general LMEM+SMEM requirement. Are
>>> there
>>>> any identified first-class clients that require such access, or is it
>>>> only debugging tools?
>>>>
>>>> If only debug, then why can't the tool use a copy engine submission
>>>> to access the data in place? Or perhaps a bespoke ioctl to access
>>>> this via the KMD (and kmd submitted copy-engine BB)?
>>>>
>>>> Thanks,
>>>>
>>>> Jon
>>>>
>>>>> -----Original Message-----
>>>>> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>> Sent: Thursday, March 17, 2022 2:35 AM
>>>>> To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Bloomfield,
>>>>> Jon
>>>>> <jon.bloomfield@intel.com>; Intel Graphics Development <intel-
>>>>> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>;
>>>>> C,
>>>>> Ramalingam <ramalingam.c@intel.com>
>>>>> Subject: Re: Small bar recovery vs compressed content on DG2
>>>>>
>>>>> On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
>>>>>> Quoting Thomas Hellström (2022-03-16 09:25:16)
>>>>>>> Hi!
>>>>>>>
>>>>>>> Do we somehow need to clarify in the headers the semantics for
>>>>>>> this?
>>>>>>>
>>>>>>>   From my understanding when discussing the CCS migration series
>>>>>>> with
>>>>>>> Ram, the kernel will never do any resolving (compressing /
>>>>>>> decompressing) migrations or evictions which basically implies
>>>>>>> the
>>>>>>> following:
>>>>>>>
>>>>>>> *) Compressed data must have LMEM only placement, otherwise the
>>>>> GPU
>>>>>>> would read garbage if accessing from SMEM.
>>>>>> This has always been the case, so it should be documented in the
>>>>>> uAPI
>>>>>> headers and kerneldocs.
>>>>>>
>>>>>>> *) Compressed data can't be assumed to be mappable by the CPU,
>>>>>>> because
>>>>>>> in order to ensure that on small BAR, the placement needs to be
>>>>>>> LMEM+SMEM.
>>>>>> Not strictly true, as we could always migrate to the mappable
>>>>>> region
>>>>>> in
>>>>>> the CPU fault handler. Will need the same set of tricks as with
>>>>>> limited
>>>>>> mappable GGTT in past.
>>>>> In addition to Matt's reply:
>>>>>
>>>>> Yes, if there is sufficient space. I'm not sure we want to
>>>>> complicate
>>>>> this to migrate only part of the buffer to mappable on a fault
>>>>> basis?
>>>>> Otherwise this is likely to fail.
>>>>>
>>>>> One option is to allow cpu-mapping from SYSTEM like TTM is doing
>>>>> for
>>>>> evicted buffers, even if SYSTEM is not in the placement list, and
>>>>> then
>>>>> migrate back to LMEM for gpu access.
>>>>>
>>>>> But can user-space even interpret the compressed data when CPU-
>>>>> mapping?
>>>>> without access to the CCS metadata?
>>>>>
>>>>>>> *) Neither can compressed data be part of a CAPTURE buffer,
>>>>>>> because
>>>>>>> that
>>>>>>> requires the data to be CPU-mappable.
>>>>>> Especially this will be too big of a limitation which we can't
>>>>>> really
>>>>>> afford
>>>>>> when it comes to debugging.
>>>>> Same here WRT user-space interpretation.
>>>>>
>>>>> This will become especially tricky on small BAR, because either we
>>>>> need
>>>>> to fit all compressed buffers in the mappable portion, or be able
>>>>> to
>>>>> blit the contents of the capture buffers from within the fence
>>>>> signalling critical section, which will require a lot of work I
>>>>> guess.
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>> Regards, Joonas
>>>>>>
>>>>>>> Are we (and user-mode drivers) OK with these restrictions, or
>>>>>>> do we
>>>>>>> need
>>>>>>> to rethink?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-18 18:12           ` Daniel Vetter
  2022-03-21  6:53             ` Thomas Hellström
@ 2022-03-31  9:25             ` Matthew Auld
  2022-04-04  9:04               ` Thomas Hellström
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Auld @ 2022-03-31  9:25 UTC (permalink / raw)
  To: Daniel Vetter, Bloomfield, Jon, Kenneth W Graunke,
	Lionel Landwerlin
  Cc: Thomas Hellström, Intel Graphics Development, dri-devel,
	Kenneth Graunke, Vetter, Daniel

On 18/03/2022 18:12, Daniel Vetter wrote:
> Maybe also good to add dri-devel to these discussions.
> 
> I'm not sure where exactly we landed with dgpu error capture (maybe I
> should check the code but it's really w/e here), but I think we can
> also toss in "you need a non-recoverable context for error capture to
> work on dgpu". Since that simplifies things even more. Maybe Thomas
> forgot to add that to the list of restrictions.
> 
> Anyway on the "we can't capture lmem-only compressed buffers", I think
> that's totally fine. Those are for render targets, and we don't
> capture those. Adding Lionel and Ken to confirm.

Ken, Lionel: gentle ping on this?

> -Daniel
> 
> On Fri, 18 Mar 2022 at 17:26, Bloomfield, Jon <jon.bloomfield@intel.com> wrote:
>>
>> @Thomas Hellström - I agree :-)
>>
>> My question was really to @Joonas Lahtinen, who was saying we could always migrate in the CPU fault handler. I am pushing back on that unless we have no choice. It's the very complication we were trying to avoid with the current SAS. If that's what's needed, then so be it. But I'm asking whether we can instead handle this specially, instead of adding generic complexity to the primary code paths.
>>
>> Jon
>>
>>> -----Original Message-----
>>> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Sent: Friday, March 18, 2022 2:48 AM
>>> To: Bloomfield, Jon <jon.bloomfield@intel.com>; Joonas Lahtinen
>>> <joonas.lahtinen@linux.intel.com>; Intel Graphics Development <intel-
>>> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>; C,
>>> Ramalingam <ramalingam.c@intel.com>; Vetter, Daniel
>>> <daniel.vetter@intel.com>
>>> Subject: Re: Small bar recovery vs compressed content on DG2
>>>
>>> Hi,
>>>
>>> On Thu, 2022-03-17 at 18:21 +0000, Bloomfield, Jon wrote:
>>>> +@Vetter, Daniel
>>>>
>>>> Let's not start re-inventing this on the fly again. That's how we got
>>>> into trouble in the past. The SAS/Whitepaper does currently require
>>>> the SMEM+LMEM placement for mappable, for good reasons.
>>>
>>> Just to avoid any misunderstandings here:
>>>
>>> We have two hard requirements from Arch that clash, main problem is
>>> compressed bos can't be captured on error with current designs.
>>>
>>>  From an engineering point of view we can do little more than list
>>> options available to resolve this and whether they are hard or not so
>>> hard to implemement. But IMHO Arch needs to agree on what's got to
>>> give.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>>>
>>>> We cannot 'always migrate to mappable in the fault handler'. Or at
>>>> least, this is not as trivial as it is to write in a sentence due to
>>>> the need to spill out other active objects, and all the usual
>>>> challenges with context synchronization etc. It is possible, perhaps
>>>> with a lot of care, but it is challenging to guarantee, easy to
>>>> break, and not needed for 99.9% of software. We are trying to
>>>> simplify our driver stack.
>>>>
>>>> If we need a special mechanism for debug, we should devise a special
>>>> mechanism, not throw out the general LMEM+SMEM requirement. Are
>>> there
>>>> any identified first-class clients that require such access, or is it
>>>> only debugging tools?
>>>>
>>>> If only debug, then why can't the tool use a copy engine submission
>>>> to access the data in place? Or perhaps a bespoke ioctl to access
>>>> this via the KMD (and kmd submitted copy-engine BB)?
>>>>
>>>> Thanks,
>>>>
>>>> Jon
>>>>
>>>>> -----Original Message-----
>>>>> From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>> Sent: Thursday, March 17, 2022 2:35 AM
>>>>> To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Bloomfield,
>>>>> Jon
>>>>> <jon.bloomfield@intel.com>; Intel Graphics Development <intel-
>>>>> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>;
>>>>> C,
>>>>> Ramalingam <ramalingam.c@intel.com>
>>>>> Subject: Re: Small bar recovery vs compressed content on DG2
>>>>>
>>>>> On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
>>>>>> Quoting Thomas Hellström (2022-03-16 09:25:16)
>>>>>>> Hi!
>>>>>>>
>>>>>>> Do we somehow need to clarify in the headers the semantics for
>>>>>>> this?
>>>>>>>
>>>>>>>   From my understanding when discussing the CCS migration series
>>>>>>> with
>>>>>>> Ram, the kernel will never do any resolving (compressing /
>>>>>>> decompressing) migrations or evictions which basically implies
>>>>>>> the
>>>>>>> following:
>>>>>>>
>>>>>>> *) Compressed data must have LMEM only placement, otherwise the
>>>>> GPU
>>>>>>> would read garbage if accessing from SMEM.
>>>>>>
>>>>>> This has always been the case, so it should be documented in the
>>>>>> uAPI
>>>>>> headers and kerneldocs.
>>>>>>
>>>>>>> *) Compressed data can't be assumed to be mappable by the CPU,
>>>>>>> because
>>>>>>> in order to ensure that on small BAR, the placement needs to be
>>>>>>> LMEM+SMEM.
>>>>>>
>>>>>> Not strictly true, as we could always migrate to the mappable
>>>>>> region
>>>>>> in
>>>>>> the CPU fault handler. Will need the same set of tricks as with
>>>>>> limited
>>>>>> mappable GGTT in past.
>>>>>
>>>>> In addition to Matt's reply:
>>>>>
>>>>> Yes, if there is sufficient space. I'm not sure we want to
>>>>> complicate
>>>>> this to migrate only part of the buffer to mappable on a fault
>>>>> basis?
>>>>> Otherwise this is likely to fail.
>>>>>
>>>>> One option is to allow cpu-mapping from SYSTEM like TTM is doing
>>>>> for
>>>>> evicted buffers, even if SYSTEM is not in the placement list, and
>>>>> then
>>>>> migrate back to LMEM for gpu access.
>>>>>
>>>>> But can user-space even interpret the compressed data when CPU-
>>>>> mapping?
>>>>> without access to the CCS metadata?
>>>>>
>>>>>>
>>>>>>> *) Neither can compressed data be part of a CAPTURE buffer,
>>>>>>> because
>>>>>>> that
>>>>>>> requires the data to be CPU-mappable.
>>>>>>
>>>>>> Especially this will be too big of a limitation which we can't
>>>>>> really
>>>>>> afford
>>>>>> when it comes to debugging.
>>>>>
>>>>> Same here WRT user-space interpretation.
>>>>>
>>>>> This will become especially tricky on small BAR, because either we
>>>>> need
>>>>> to fit all compressed buffers in the mappable portion, or be able
>>>>> to
>>>>> blit the contents of the capture buffers from within the fence
>>>>> signalling critical section, which will require a lot of work I
>>>>> guess.
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>>
>>>>>> Regards, Joonas
>>>>>>
>>>>>>> Are we (and user-mode drivers) OK with these restrictions, or
>>>>>>> do we
>>>>>>> need
>>>>>>> to rethink?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
  2022-03-31  9:25             ` Matthew Auld
@ 2022-04-04  9:04               ` Thomas Hellström
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Hellström @ 2022-04-04  9:04 UTC (permalink / raw)
  To: Matthew Auld, Daniel Vetter, Bloomfield, Jon, Kenneth W Graunke,
	Lionel Landwerlin
  Cc: Intel Graphics Development, dri-devel, Kenneth Graunke,
	Vetter, Daniel

On Thu, 2022-03-31 at 10:25 +0100, Matthew Auld wrote:
> On 18/03/2022 18:12, Daniel Vetter wrote:
> > Maybe also good to add dri-devel to these discussions.
> > 
> > I'm not sure where exactly we landed with dgpu error capture (maybe
> > I
> > should check the code but it's really w/e here), but I think we can
> > also toss in "you need a non-recoverable context for error capture
> > to
> > work on dgpu". Since that simplifies things even more. Maybe Thomas
> > forgot to add that to the list of restrictions.

Not sure whether we reached a conclusion here, but after discussing
with Daniel in another thread, What about:

1) Reject error capture on recoverable contexts. That means we are free
to implement reasonable error capture from outside the fence signalling
critical path moving forward. Makes it easier to blit buffer content.

2) No additional restrictions on capture buffers, They are best effort
anyway. If they are not mappable, they don't end up in the error log
for now (affects only small BAR systems). Moving forward we can blit
the content to system or mappable LMEM for capture once the gpu reset
has completed.

/Thomas

> > 
> > Anyway on the "we can't capture lmem-only compressed buffers", I
> > think
> > that's totally fine. Those are for render targets, and we don't
> > capture those. Adding Lionel and Ken to confirm.
> 
> Ken, Lionel: gentle ping on this?
> 
> > -Daniel
> > 
> > On Fri, 18 Mar 2022 at 17:26, Bloomfield, Jon
> > <jon.bloomfield@intel.com> wrote:
> > > 
> > > @Thomas Hellström - I agree :-)
> > > 
> > > My question was really to @Joonas Lahtinen, who was saying we
> > > could always migrate in the CPU fault handler. I am pushing back
> > > on that unless we have no choice. It's the very complication we
> > > were trying to avoid with the current SAS. If that's what's
> > > needed, then so be it. But I'm asking whether we can instead
> > > handle this specially, instead of adding generic complexity to
> > > the primary code paths.
> > > 
> > > Jon
> > > 
> > > > -----Original Message-----
> > > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Sent: Friday, March 18, 2022 2:48 AM
> > > > To: Bloomfield, Jon <jon.bloomfield@intel.com>; Joonas Lahtinen
> > > > <joonas.lahtinen@linux.intel.com>; Intel Graphics Development
> > > > <intel-
> > > > gfx@lists.freedesktop.org>; Auld, Matthew
> > > > <matthew.auld@intel.com>; C,
> > > > Ramalingam <ramalingam.c@intel.com>; Vetter, Daniel
> > > > <daniel.vetter@intel.com>
> > > > Subject: Re: Small bar recovery vs compressed content on DG2
> > > > 
> > > > Hi,
> > > > 
> > > > On Thu, 2022-03-17 at 18:21 +0000, Bloomfield, Jon wrote:
> > > > > +@Vetter, Daniel
> > > > > 
> > > > > Let's not start re-inventing this on the fly again. That's
> > > > > how we got
> > > > > into trouble in the past. The SAS/Whitepaper does currently
> > > > > require
> > > > > the SMEM+LMEM placement for mappable, for good reasons.
> > > > 
> > > > Just to avoid any misunderstandings here:
> > > > 
> > > > We have two hard requirements from Arch that clash, main
> > > > problem is
> > > > compressed bos can't be captured on error with current designs.
> > > > 
> > > >  From an engineering point of view we can do little more than
> > > > list
> > > > options available to resolve this and whether they are hard or
> > > > not so
> > > > hard to implemement. But IMHO Arch needs to agree on what's got
> > > > to
> > > > give.
> > > > 
> > > > Thanks,
> > > > Thomas
> > > > 
> > > > 
> > > > > 
> > > > > We cannot 'always migrate to mappable in the fault handler'.
> > > > > Or at
> > > > > least, this is not as trivial as it is to write in a sentence
> > > > > due to
> > > > > the need to spill out other active objects, and all the usual
> > > > > challenges with context synchronization etc. It is possible,
> > > > > perhaps
> > > > > with a lot of care, but it is challenging to guarantee, easy
> > > > > to
> > > > > break, and not needed for 99.9% of software. We are trying to
> > > > > simplify our driver stack.
> > > > > 
> > > > > If we need a special mechanism for debug, we should devise a
> > > > > special
> > > > > mechanism, not throw out the general LMEM+SMEM requirement.
> > > > > Are
> > > > there
> > > > > any identified first-class clients that require such access,
> > > > > or is it
> > > > > only debugging tools?
> > > > > 
> > > > > If only debug, then why can't the tool use a copy engine
> > > > > submission
> > > > > to access the data in place? Or perhaps a bespoke ioctl to
> > > > > access
> > > > > this via the KMD (and kmd submitted copy-engine BB)?
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Jon
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > > > Sent: Thursday, March 17, 2022 2:35 AM
> > > > > > To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>;
> > > > > > Bloomfield,
> > > > > > Jon
> > > > > > <jon.bloomfield@intel.com>; Intel Graphics Development
> > > > > > <intel-
> > > > > > gfx@lists.freedesktop.org>; Auld, Matthew
> > > > > > <matthew.auld@intel.com>;
> > > > > > C,
> > > > > > Ramalingam <ramalingam.c@intel.com>
> > > > > > Subject: Re: Small bar recovery vs compressed content on
> > > > > > DG2
> > > > > > 
> > > > > > On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote:
> > > > > > > Quoting Thomas Hellström (2022-03-16 09:25:16)
> > > > > > > > Hi!
> > > > > > > > 
> > > > > > > > Do we somehow need to clarify in the headers the
> > > > > > > > semantics for
> > > > > > > > this?
> > > > > > > > 
> > > > > > > >   From my understanding when discussing the CCS
> > > > > > > > migration series
> > > > > > > > with
> > > > > > > > Ram, the kernel will never do any resolving
> > > > > > > > (compressing /
> > > > > > > > decompressing) migrations or evictions which basically
> > > > > > > > implies
> > > > > > > > the
> > > > > > > > following:
> > > > > > > > 
> > > > > > > > *) Compressed data must have LMEM only placement,
> > > > > > > > otherwise the
> > > > > > GPU
> > > > > > > > would read garbage if accessing from SMEM.
> > > > > > > 
> > > > > > > This has always been the case, so it should be documented
> > > > > > > in the
> > > > > > > uAPI
> > > > > > > headers and kerneldocs.
> > > > > > > 
> > > > > > > > *) Compressed data can't be assumed to be mappable by
> > > > > > > > the CPU,
> > > > > > > > because
> > > > > > > > in order to ensure that on small BAR, the placement
> > > > > > > > needs to be
> > > > > > > > LMEM+SMEM.
> > > > > > > 
> > > > > > > Not strictly true, as we could always migrate to the
> > > > > > > mappable
> > > > > > > region
> > > > > > > in
> > > > > > > the CPU fault handler. Will need the same set of tricks
> > > > > > > as with
> > > > > > > limited
> > > > > > > mappable GGTT in past.
> > > > > > 
> > > > > > In addition to Matt's reply:
> > > > > > 
> > > > > > Yes, if there is sufficient space. I'm not sure we want to
> > > > > > complicate
> > > > > > this to migrate only part of the buffer to mappable on a
> > > > > > fault
> > > > > > basis?
> > > > > > Otherwise this is likely to fail.
> > > > > > 
> > > > > > One option is to allow cpu-mapping from SYSTEM like TTM is
> > > > > > doing
> > > > > > for
> > > > > > evicted buffers, even if SYSTEM is not in the placement
> > > > > > list, and
> > > > > > then
> > > > > > migrate back to LMEM for gpu access.
> > > > > > 
> > > > > > But can user-space even interpret the compressed data when
> > > > > > CPU-
> > > > > > mapping?
> > > > > > without access to the CCS metadata?
> > > > > > 
> > > > > > > 
> > > > > > > > *) Neither can compressed data be part of a CAPTURE
> > > > > > > > buffer,
> > > > > > > > because
> > > > > > > > that
> > > > > > > > requires the data to be CPU-mappable.
> > > > > > > 
> > > > > > > Especially this will be too big of a limitation which we
> > > > > > > can't
> > > > > > > really
> > > > > > > afford
> > > > > > > when it comes to debugging.
> > > > > > 
> > > > > > Same here WRT user-space interpretation.
> > > > > > 
> > > > > > This will become especially tricky on small BAR, because
> > > > > > either we
> > > > > > need
> > > > > > to fit all compressed buffers in the mappable portion, or
> > > > > > be able
> > > > > > to
> > > > > > blit the contents of the capture buffers from within the
> > > > > > fence
> > > > > > signalling critical section, which will require a lot of
> > > > > > work I
> > > > > > guess.
> > > > > > 
> > > > > > /Thomas
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Regards, Joonas
> > > > > > > 
> > > > > > > > Are we (and user-mode drivers) OK with these
> > > > > > > > restrictions, or
> > > > > > > > do we
> > > > > > > > need
> > > > > > > > to rethink?
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > 
> > > > > > > > Thomas
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> > 



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-04  9:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-16  7:25 [Intel-gfx] Small bar recovery vs compressed content on DG2 Thomas Hellström
2022-03-17  8:43 ` Joonas Lahtinen
2022-03-17  9:29   ` Matthew Auld
2022-03-17  9:35   ` Thomas Hellström
2022-03-17 18:21     ` Bloomfield, Jon
2022-03-18  9:48       ` Thomas Hellström
2022-03-18 16:25         ` Bloomfield, Jon
2022-03-18 18:12           ` Daniel Vetter
2022-03-21  6:53             ` Thomas Hellström
2022-03-31  9:25             ` Matthew Auld
2022-04-04  9:04               ` Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox