Re: [PATCH 8/9] drm/i915: Implement .get_format_info() hook for CCS

From: Ben Widawsky <ben@bwidawsk.net>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: Chad Versace <chadversary@chromium.org>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Vandana Kannan <vandana.kannan@intel.com>
Subject: Re: [PATCH 8/9] drm/i915: Implement .get_format_info() hook for CCS
Date: Mon, 27 Feb 2017 21:36:47 -0800	[thread overview]
Message-ID: <20170228053645.GA4011@mail.bwidawsk.net> (raw)
In-Reply-To: <20170227151341.GE31595@intel.com>

On 17-02-27 17:13:41, Ville Syrjälä wrote:
>On Sun, Feb 26, 2017 at 02:41:50PM -0800, Chad Versace wrote:
>> On Wed 04 Jan 2017, ville.syrjala@linux.intel.com wrote:
>> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> >
>> > SKL+ display engine can scan out certain kinds of compressed surfaces
>> > produced by the render engine. This involved telling the display engine
>> > the location of the color control surfae (CCS) which describes which
>> > parts of the main surface are compressed and which are not. The location
>> > of CCS is provided by userspace as just another plane with its own offset.
>> >
>> > By providing our own format information for the CCS formats, we should
>> > be able to make framebuffer_check() do the right thing for the CCS
>> > surface as well.
>> >
>> > Note that we'll return the same format info for both Y and Yf tiled
>> > format as that's what happens with the non-CCS Y vs. Yf as well. If
>> > desired, we could potentially return a unique pointer for each
>> > pixel_format+tiling+ccs combination, in which case we immediately be
>> > able to tell if any of that stuff changed by just comparing the
>> > pointers. But that does sound a bit wasteful space wise.
>> >
>> > v2: Drop the 'dev' argument from the hook
>> > v3: Include the description of the CCS surface layout
>> >
>> > Cc: Vandana Kannan <vandana.kannan@intel.com>
>> > Cc: Daniel Vetter <daniel@ffwll.ch>
>> > Cc: Ben Widawsky <ben@bwidawsk.net>
>> > Cc: Jason Ekstrand <jason@jlekstrand.net>
>> > Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
>> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
>> >  include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 85 insertions(+)
>>
>>
>> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> > index 9e1bb7fabcde..4581e3d41e5c 100644
>> > --- a/include/uapi/drm/drm_fourcc.h
>> > +++ b/include/uapi/drm/drm_fourcc.h
>> > @@ -230,6 +230,55 @@ extern "C" {
>> >  #define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
>> >
>> >  /*
>> > + * Intel color control surface (CCS) for render compression
>> > + *
>> > + * The framebuffer format must be one of the 8:8:8:8 RGB formats,
>> > + * the main surface will be plane index 0 and will be Y/Yf-tiled,
>> > + * the CCS will be plane index 1.
>> > + *
>>
>> The paragraph below is...
>>
>> > + * Each byte of CCS contains 4 pairs of bits, with each pair of bits
>> > + * matching an area of 8x4 pixels of the main surface. Which would seem
>> > + * to match 2 cachelines containing 4x4 pixels each. The pairs bits within
>> > + * the byte form a 2x2 grid, which thus matches a 16x8 pixel area of the
>> > + * main surface. This is the 2x2 pattern the bits form (0=lsb, 7=msb):
>> > + * -----------
>> > + * | 01 | 23 |
>> > + *  ----------
>> > + * | 45 | 67 |
>> > + * -----------
>>
>> ...almost correct. The hw docs state that each bit-pair of the CCS maps
>> cacheline-pair, horizontally adjacent in the Y tile, of the primary surface. As
>> a consequence, the remainder of the above paragraph is correct for 32-bit
>> formats, but not others.
>
>Which is why the comment says at the very top that the fb needs to
>use a 8:8:8:8 format. IIRC that's the only thing the hardware supports.
>

It is, and it is for the foreseeable future too. Chad, granted you say this
isn't a nitpick below, but it is because Ville's patch is for the KMS case, it's
not entirely relevant here.

>>
>> This is not a nitpick, because Vulkan and EGL users may want to share
>> dma_bufs with a fat format like R32G32B32A32_UNORM, and want to have CCS
>> enabled when possible. As long as the users use the dma_buf only the 3D
>> engine, and don't submit it to KMS, it's all safe.
>>
>> For those users, we should document the bit-pair/cacheline-pair relationship
>> correctly. And then preface the following detailed explanation and nice ascii
>> diagrams by saying "this applies only to the 32-bit formats".
>>
>> Here's the relevant PRM quote:
>>
>>      The Color Control Surface (CCS) contains the compression status
>>      of the cache-line pairs. The compression state of the cache-line
>>      pair is specified by 2 bits in the CCS.  Each CCS cache-line
>>      represents an area on the main surface of 16x16 sets of 128 byte
>>      Y-tiled cache-line-pairs. CCS is always Y tiled.
>>
>> See this Mesa comment for more details:
>> https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/isl/isl.c?h=17.0#n211
>>
>> > + * Actually only the lower bit of the pair seems to have any effect.
>> > + * No idea why. 0 in the lower bit would seem to mean not compressed,
>> > + * and 1 is compressed.  The interpreation of the main surface data
>> > + * when the block is marked compressed is unknown as of now.
>>
>> If I recall correctly (it's been several months since I inspected the
>> bits), the high bit is actually significant. Bit pattern 11 means that
>> the data in primary surface's cacheline-pair is invalid; the 3D engine
>> interprets the color of all pixels in that cacheline-pair to be the
>> clear color stored in RENDER_SURFACE_STATE. Before the display engine
>> can consume the surface, userspace is required to do a partial resolve,
>> flushing the clear color into the primary surface. So it makes sense
>> that the kernel would never observe that bit pattern in an incoming ccs
>> surface, as long as userspace is doing its job correctly. And it makes
>> sense that the display engine would ignore the high bit, because there is
>> no way to provide the clear color to the display engine (at least
>> according my reading of the docs).
>>
>> Jason, does this match your understanding of the high bit?
>>
>> > + *
>> > + * CCS tile is laid out in 8 byte horizontal strips each strip thus corresponds
>> > + * to a 128x8 pixel are of the main surface. So each 8x8 bytes of the CCS
>> > + * (1 cacheline) will match an area of 4x2 tiles on the main surface.
>> > + *
>> > + * Here is the layout of a full CCS tile, with the 8 byte strips numbered 0-511:
>> > + * ------------------------
>> > + * |  0 |  64 | ... | 448 |
>> > + * |  1 |  65 |     | 449 |
>> > + * |  2 |  66 |     | 450 |
>> > + * |  . |   . |     |   . |
>> > + * |  . |   . |     |   . |
>> > + * |  . |   . |     |   . |
>> > + * | 63 | 127 |     | 511 |
>> > + * ------------------------
>> > + *
>> > + * This will match an area of 1024x512 pixels on the main surface.
>> > + *
>> > + * The CCS plane pitch must be a multiple of the CCS tile width (64 bytes),
>> > + * and for the purposes of the CCS plane offset we assume cpp==1. As each
>> > + * byte matches a 16x8 area of the main surface, the dimensions of the CCS
>> > + * plane will thus appear to be width/16 x height/8. Both planes must be
>> > + * contained within the same gem object.
>>
>> Again, the above paragraphs should clarify that they apply only to 32-bit formats.
>>
>> > + */
>> > +#define I915_FORMAT_MOD_Y_TILED_CCS	fourcc_mod_code(INTEL, 4)
>> > +#define I915_FORMAT_MOD_Yf_TILED_CCS	fourcc_mod_code(INTEL, 5)
>
>-- 
>Ville Syrjälä
>Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx