From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chad Versace Subject: Re: [PATCH] intel: Fix stencil buffer to be W tiled Date: Mon, 18 Jul 2011 14:24:03 -0700 Message-ID: <4E24A473.3040409@chad-versace.us> References: <1310975703-20269-1-git-send-email-chad@chad-versace.us> <1310975703-20269-3-git-send-email-chad@chad-versace.us> <4E248035.1030700@freedesktop.org> <4E249D96.1020408@chad-versace.us> <4E249F4F.4080702@freedesktop.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0031183365==" Return-path: In-Reply-To: <4E249F4F.4080702@freedesktop.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: mesa-dev-bounces+gcvmd-mesa3d-493=gmane.org@lists.freedesktop.org Errors-To: mesa-dev-bounces+gcvmd-mesa3d-493=gmane.org@lists.freedesktop.org To: Ian Romanick Cc: mesa-dev@lists.freedesktop.org, intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============0031183365== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigD8C7657ABE8F2A059E49CF01" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigD8C7657ABE8F2A059E49CF01 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 07/18/2011 02:02 PM, Ian Romanick wrote: > On 07/18/2011 01:54 PM, Chad Versace wrote: >> On 07/18/2011 11:49 AM, Ian Romanick wrote: >>> On 07/18/2011 12:55 AM, Chad Versace wrote: >>>> Until now, the stencil buffer was allocated as a Y tiled buffer, bec= ause >>>> in several locations the PRM states that it is. However, it is actua= lly >>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section >>>> 4.5.2.1 W-Major Format: >>>> W-Major Tile Format is used for separate stencil. >>> >>>> The GTT is incapable of W fencing, so we allocate the stencil buffer= with >>>> I915_TILING_NONE and decode the tile's layout in software. >>> >>>> This fix touches the following portions of code: >>>> - In intel_allocate_renderbuffer_storage(), allocate the stencil= >>>> buffer with I915_TILING_NONE. >>>> - In intel_verify_dri2_has_hiz(), verify that the stencil buffer= is >>>> not tiled. >>>> - In the stencil buffer's span functions, the tile's layout must= be >>>> decoded in software. >>> >>>> This commit mutually depends on the xf86-video-intel commit >>>> dri: Do not tile stencil buffer >>>> Author: Chad Versace >>>> Date: Mon Jul 18 00:38:00 2011 -0700 >>> >>>> On Gen6 with separate stencil enabled, fixes the following Piglit te= sts: >>>> bugs/fdo23670-drawpix_stencil >>>> general/stencil-drawpixels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copyp= ixels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawp= ixels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readp= ixels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpi= xels >>>> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpi= xels >>>> spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-co= pypixels >>>> spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-re= adpixels >>>> spec/EXT_packed_depth_stencil/readpixels-24_8 >>> >>>> Note: This is a candidate for the 7.11 branch. >>> >>>> CC: Eric Anholt >>>> CC: Kenneth Graunke >>>> Signed-off-by: Chad Versace >>>> --- >>>> src/mesa/drivers/dri/intel/intel_clear.c | 6 ++ >>>> src/mesa/drivers/dri/intel/intel_context.c | 9 ++- >>>> src/mesa/drivers/dri/intel/intel_fbo.c | 13 +++-- >>>> src/mesa/drivers/dri/intel/intel_screen.h | 9 ++- >>>> src/mesa/drivers/dri/intel/intel_span.c | 85 +++++++++++++++++= +++-------- >>>> 5 files changed, 89 insertions(+), 33 deletions(-) >>> >>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/dri= vers/dri/intel/intel_clear.c >>>> index dfca03c..5ab9873 100644 >>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c >>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c >>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield m= ask) >>>> */ >>>> tri_mask |=3D BUFFER_BIT_STENCIL; >>>> } >>>> + else if (intel->has_separate_stencil && >>>> + stencilRegion->tiling =3D=3D I915_TILING_NONE) { >>>> + /* The stencil buffer is actually W tiled, which the hardware >>>> + * cannot blit to. */ >>>> + tri_mask |=3D BUFFER_BIT_STENCIL; >>>> + } >>>> else { >>>> /* clearing all stencil bits, use blitting */ >>>> blit_mask |=3D BUFFER_BIT_STENCIL; >>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/d= rivers/dri/intel/intel_context.c >>>> index 2ba1363..fe8be08 100644 >>>> --- a/src/mesa/drivers/dri/intel/intel_context.c >>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c >>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_contex= t *intel, >>>> assert(stencil_rb->Base.Format =3D=3D MESA_FORMAT_S8); >>>> assert(depth_rb && depth_rb->Base.Format =3D=3D MESA_FORMAT_X= 8_Z24); >>> >>>> - if (stencil_rb->region->tiling =3D=3D I915_TILING_Y) { >>>> + if (stencil_rb->region->tiling =3D=3D I915_TILING_NONE) { >>>> + /* >>>> + * The stencil buffer is actually W tiled. The region's tiling is= >>>> + * I915_TILING_NONE, however, because the GTT is incapable of W >>>> + * fencing. >>>> + */ >>>> intel->intelScreen->dri2_has_hiz =3D INTEL_DRI2_HAS_HIZ_TRUE; >>>> return; >>>> } else { >>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context= *intel, >>>> * Presently, however, no verification or clean up is necessa= ry, and >>>> * execution should not reach here. If the framebuffer still = has a hiz >>>> * region, then we have already set dri2_has_hiz to true afte= r >>>> - * confirming above that the stencil buffer is Y tiled. >>>> + * confirming above that the stencil buffer is W tiled. >>>> */ >>>> assert(0); >>>> } >>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drive= rs/dri/intel/intel_fbo.c >>>> index 1669af2..507cc33 100644 >>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c >>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c >>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_conte= xt * ctx, struct gl_renderbuffer >>> >>>> if (irb->Base.Format =3D=3D MESA_FORMAT_S8) { >>>> /* >>>> + * The stencil buffer is W tiled. However, we request from th= e kernel a >>>> + * non-tiled buffer because the GTT is incapable of W fencing= =2E >>>> + * >>>> * The stencil buffer has quirky pitch requirements. From Vo= l 2a, >>>> * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch": >>>> * The pitch must be set to 2x the value computed based on= width, as >>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_con= text * ctx, struct gl_renderbuffer >>>> * To accomplish this, we resort to the nasty hack of doublin= g the drm >>>> * region's cpp and halving its height. >>>> * >>>> - * If we neglect to double the pitch, then drm_intel_gem_bo_m= ap_gtt() >>>> - * maps the memory incorrectly. >>>> + * If we neglect to double the pitch, then render corruption = occurs. >>>> */ >>>> irb->region =3D intel_region_alloc(intel->intelScreen, >>>> - I915_TILING_Y, >>>> + I915_TILING_NONE, >>>> cpp * 2, >>>> - width, >>>> - height / 2, >>>> + ALIGN(width, 64), >>>> + /* XXX: Maybe align to 128? */ >>>> + ALIGN(height / 2, 64), >>>> GL_TRUE); >>>> if (!irb->region) >>>> return false; >>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/dr= ivers/dri/intel/intel_screen.h >>>> index b2013af..9dd6a52 100644 >>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h >>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h >>>> @@ -63,9 +63,12 @@ >>>> * x8_z24 and s8). >>>> * >>>> * Eventually, intel_update_renderbuffers() makes a DRI2 request fo= r >>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are= Y tiled, >>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and conti= nue as if >>>> - * nothing happend. >>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's til= ing is >>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has= _hiz to >>>> + * true and continue as if nothing happend. >>>> + * >>>> + * [1] The stencil buffer is actually W tiled. However, we request = from the >>>> + * kernel a non-tiled buffer because the GTT is incapable of W = fencing. >>>> * >>>> * If the buffers are X tiled, however, the handshake has failed an= d we must >>>> * clean up. >>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/driv= ers/dri/intel/intel_span.c >>>> index 153803f..d306432 100644 >>>> --- a/src/mesa/drivers/dri/intel/intel_span.c >>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c >>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context = *intel, >>>> int miny =3D 0; \ >>>> int maxx =3D rb->Width; \ >>>> int maxy =3D rb->Height; \ >>>> - int stride =3D rb->RowStride; \ >>>> - uint8_t *buf =3D rb->Data; \ >>>> + \ >>>> + /* \ >>>> + * Here we ignore rb->Data and rb->RowStride as set by \ >>>> + * intelSpanRenderStart. Since intel_offset_S8 decodes the W til= e \ >>>> + * manually, the region's *real* base address and stride is \ >>>> + * required. \ >>>> + */ \ >>>> + struct intel_renderbuffer *irb =3D intel_renderbuffer(rb); \ >>>> + uint8_t *buf =3D irb->region->buffer->virtual; \ >>>> + unsigned stride =3D irb->region->pitch; \ >>>> + unsigned height =3D 2 * irb->region->height; \ >>>> + bool flip =3D rb->Name =3D=3D 0; \ >>> >>>> -/* Don't flip y. */ >>>> #undef Y_FLIP >>>> -#define Y_FLIP(y) y >>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y)) >>> >>>> /** >>>> * \brief Get pointer offset into stencil buffer. >>>> * >>>> - * The stencil buffer interleaves two rows into one. Yay for crazy = hardware. >>>> - * The table below demonstrates how the pointer arithmetic behaves = for a buffer >>>> - * with positive stride (s=3Dstride). >>>> - * >>>> - * x | y | byte offset >>>> - * -------------------------- >>>> - * 0 | 0 | 0 >>>> - * 0 | 1 | 1 >>>> - * 1 | 0 | 2 >>>> - * 1 | 1 | 3 >>>> - * ... | ... | ... >>>> - * 0 | 2 | s >>>> - * 0 | 3 | s + 1 >>>> - * 1 | 2 | s + 2 >>>> - * 1 | 3 | s + 3=20 >>>> - * >>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W f= encing, we >>>> + * must decode the tile's layout in software. >>>> * >>>> + * See >>>> + * - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-= Major Tile >>>> + * Format. >>>> + * - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tili= ng Algorithm >>>> */ >>>> -static inline intptr_t >>>> -intel_offset_S8(int stride, GLint x, GLint y) >>>> +static inline uintptr_t >>> >>> Why the type change? >=20 >> Two reasons. >=20 >> 1. I redeclared the parameters as unsigned so to generate better code.= >> Since x, y, and stride are unsigned, the division and modulo operators= >> generate shift and bit-logic instructions rather than the slower arith= metic >> instructions. >=20 > Is stride always unsigned now?=20 intelSpanRenderStart still sets rb->RowStride to be negative for system stencil buffers. But we ignore that and use a positive stride instead. To quote the hunk above: + /* \ + * Here we ignore rb->Data and rb->RowStride as set by \ + * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile \ + * manually, the region's *real* base address and stride is \ + * required. \ + */ \ + struct intel_renderbuffer *irb =3D intel_renderbuffer(rb); \ + uint8_t *buf =3D irb->region->buffer->virtual; \ + unsigned stride =3D irb->region->pitch; By setting the buf and stride to the *real* base address and stride, intel_offset_S8 can implement the exact W-tiling algorithm as described in the bspec. (PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Ti= ling Algorithm). If intel_offset_S8 instead used rb->Data and rb->RowStride as the buffer'= s base address and stride, then intel_offset_S8 would essentially need two implementations, like this: intptr_t intel_offset_S8(...) { if (stride > 0) { // do normal stuff; } else { // do upside down stuff; } } I'd like to avoid such a bifurcated function. > Will it always be in the future?=20 Yes. > This is the problem that we encountered with bug #37351 (fixed by e8b1c= 6d). Ah... Thanks for recalling that bug. I believe setting the return type of= intel_offset_S8 to be intptr_t will eliminate reoccurrence of this bug. D= o you agree? >> Below is a comparison of x % 64, signed versus unsigned, compiled with= gcc -O3. >> In f, the % generates 5 instructions. In g, only one instruction. >=20 >> int f(int x) { return x % 64; } >=20 >> f: >> movl %edi, %edx >> sarl $31, %edx >> shrl $26, %edx >> leal (%rdi,%rdx), %eax >> andl $63, %eax >> subl %edx, %eax >> ret >=20 >> unsigned g(unsigned x) { return x % 64); >=20 >> g: >> movl %edi, %eax >> andl $63, %eax >> ret >=20 >> 2. I redeclared the return type as unsigned to make it explicit that i= t does >> not return a negative offset. >=20 >>>> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y) >>>> { >>>> - return 2 * ((y / 2) * stride + x) + y % 2; >>>> + uint32_t tile_size =3D 4096; >>>> + uint32_t tile_width =3D 64; >>>> + uint32_t tile_height =3D 64; >>>> + uint32_t row_size =3D 64 * stride; >>>> + >>>> + uint32_t tile_x =3D x / tile_width; >>>> + uint32_t tile_y =3D y / tile_height; >>>> + >>>> + /* The byte's address relative to the tile's base addres. */ >>>> + uint32_t byte_x =3D x % tile_width; >>>> + uint32_t byte_y =3D y % tile_height; >>>> + >>>> + uintptr_t u =3D tile_y * row_size >>>> + + tile_x * tile_size >>>> + + 512 * (byte_x / 8) >>>> + + 64 * (byte_y / 8) >>>> + + 32 * ((byte_y / 4) % 2) >>>> + + 16 * ((byte_x / 4) % 2) >>>> + + 8 * ((byte_y / 2) % 2) >>>> + + 4 * ((byte_x / 2) % 2) >>>> + + 2 * (byte_y % 2) >>>> + + 1 * (byte_x % 2); >>>> + >>>> + /* >>>> + * Errata for Gen5: >>>> + * >>>> + * An additional offset is needed which is not documented in the= PRM. >>>> + * >>>> + * if ((byte_x / 8) % 2 =3D=3D 1) { >>>> + * if ((byte_y / 8) % 2) =3D=3D 0) { >>>> + * u +=3D 64; >>>> + * } else { >>>> + * u -=3D 64; >>>> + * } >>>> + * } >>>> + * >>>> + * The offset is expressed more tersely as >>>> + * u +=3D ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1)); >>>> + */ >>>> + >>>> + return u; >>>> } >>> >>>> #define WRITE_STENCIL(x, y, src) buf[intel_offset_S8(stride, x, y)= ] =3D src; --=20 Chad Versace chad@chad-versace.us --------------enigD8C7657ABE8F2A059E49CF01 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJOJKRzAAoJEAIvNt057x8iSWIP+wVnJddYg/swJERhPR/o0yXp Npt6+hoK0QWsA121vnlw8pOQhpIIkMOCV4Q+LvyCd1GCuOO5ggaxv4cG/pDwIo3q NSPCHVwLM6Mt536X0YhumDYbkZHBW/cdyIKvMXC1lZUFeMe8lm3dBx7x3suGEfim 6wwGY9Jys8D4YFCW7PN5ijfDqGXH928aAHVswoZ3XlLeTcIHpqjQh76el4SH40dz hdTQPMS89TdfVxEJ4CtnOnt/y5OevtOr0LCkh6ksbKWouW6pTTKEoFYgYm5U5E7A rOvCNtQNEB2klonW03Kpq1P0yUyrWUvnxPVfCvKxWA/rWkJQv1k1Om9KJVyxfQtX 0YCw9/dMqhfJ0BFSSpRK8wpW7jP+2TVaofEI8HlMFnIpb1kMs5xtE7gPq3oaYnPf r22YXxVjBuJUG6h8aA70v2+0hZp/ZhJvLLJ2jyEfEO+OLTjae24a1qojsuMYbH35 Lsw/Cxyt4mTnlIKM/NAY0nOc6jwkZkhBUjNx6yO62OZQn4q2VlzshusEKCh/6nVt iEhIteqppUIQdcUbukCamlX75xEVjxwA7m53Sj/p3nTeIamcC4OQOx2am/3CZLTv eubUp/3sBwaEhSpEHxPNbihmhG532714CClkQ/mDKpP6zVS6g2BAkG4a8WEvepub XYKQukAcN8Kn+vTb1EQv =KYVS -----END PGP SIGNATURE----- --------------enigD8C7657ABE8F2A059E49CF01-- --===============0031183365== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev --===============0031183365==--