From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chad Versace Subject: Re: [Mesa-dev] [PATCH] intel: Fix stencil buffer to be W tiled Date: Mon, 18 Jul 2011 13:54:46 -0700 Message-ID: <4E249D96.1020408@chad-versace.us> References: <1310975703-20269-1-git-send-email-chad@chad-versace.us> <1310975703-20269-3-git-send-email-chad@chad-versace.us> <4E248035.1030700@freedesktop.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2066311384==" Return-path: In-Reply-To: <4E248035.1030700@freedesktop.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Ian Romanick Cc: mesa-dev@lists.freedesktop.org, intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============2066311384== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig55C83AF4AF91BF312F7E4965" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig55C83AF4AF91BF312F7E4965 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 07/18/2011 11:49 AM, Ian Romanick wrote: > On 07/18/2011 12:55 AM, Chad Versace wrote: >> Until now, the stencil buffer was allocated as a Y tiled buffer, becau= se >> in several locations the PRM states that it is. However, it is actuall= y >> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section >> 4.5.2.1 W-Major Format: >> W-Major Tile Format is used for separate stencil. >=20 >> The GTT is incapable of W fencing, so we allocate the stencil buffer w= ith >> I915_TILING_NONE and decode the tile's layout in software. >=20 >> This fix touches the following portions of code: >> - In intel_allocate_renderbuffer_storage(), allocate the stencil >> buffer with I915_TILING_NONE. >> - In intel_verify_dri2_has_hiz(), verify that the stencil buffer i= s >> not tiled. >> - In the stencil buffer's span functions, the tile's layout must b= e >> decoded in software. >=20 >> This commit mutually depends on the xf86-video-intel commit >> dri: Do not tile stencil buffer >> Author: Chad Versace >> Date: Mon Jul 18 00:38:00 2011 -0700 >=20 >> On Gen6 with separate stencil enabled, fixes the following Piglit test= s: >> bugs/fdo23670-drawpix_stencil >> general/stencil-drawpixels >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypix= els >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpix= els >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpix= els >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixe= ls >> spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixe= ls >> spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copy= pixels >> spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-read= pixels >> spec/EXT_packed_depth_stencil/readpixels-24_8 >=20 >> Note: This is a candidate for the 7.11 branch. >=20 >> CC: Eric Anholt >> CC: Kenneth Graunke >> Signed-off-by: Chad Versace >> --- >> src/mesa/drivers/dri/intel/intel_clear.c | 6 ++ >> src/mesa/drivers/dri/intel/intel_context.c | 9 ++- >> src/mesa/drivers/dri/intel/intel_fbo.c | 13 +++-- >> src/mesa/drivers/dri/intel/intel_screen.h | 9 ++- >> src/mesa/drivers/dri/intel/intel_span.c | 85 +++++++++++++++++++= +-------- >> 5 files changed, 89 insertions(+), 33 deletions(-) >=20 >> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drive= rs/dri/intel/intel_clear.c >> index dfca03c..5ab9873 100644 >> --- a/src/mesa/drivers/dri/intel/intel_clear.c >> +++ b/src/mesa/drivers/dri/intel/intel_clear.c >> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mas= k) >> */ >> tri_mask |=3D BUFFER_BIT_STENCIL; >> } >> + else if (intel->has_separate_stencil && >> + stencilRegion->tiling =3D=3D I915_TILING_NONE) { >> + /* The stencil buffer is actually W tiled, which the hardware >> + * cannot blit to. */ >> + tri_mask |=3D BUFFER_BIT_STENCIL; >> + } >> else { >> /* clearing all stencil bits, use blitting */ >> blit_mask |=3D BUFFER_BIT_STENCIL; >> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/dri= vers/dri/intel/intel_context.c >> index 2ba1363..fe8be08 100644 >> --- a/src/mesa/drivers/dri/intel/intel_context.c >> +++ b/src/mesa/drivers/dri/intel/intel_context.c >> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context = *intel, >> assert(stencil_rb->Base.Format =3D=3D MESA_FORMAT_S8); >> assert(depth_rb && depth_rb->Base.Format =3D=3D MESA_FORMAT_X8_= Z24); >=20 >> - if (stencil_rb->region->tiling =3D=3D I915_TILING_Y) { >> + if (stencil_rb->region->tiling =3D=3D I915_TILING_NONE) { >> + /* >> + * The stencil buffer is actually W tiled. The region's tiling is >> + * I915_TILING_NONE, however, because the GTT is incapable of W >> + * fencing. >> + */ >> intel->intelScreen->dri2_has_hiz =3D INTEL_DRI2_HAS_HIZ_TRUE; >> return; >> } else { >> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *= intel, >> * Presently, however, no verification or clean up is necessary= , and >> * execution should not reach here. If the framebuffer still ha= s a hiz >> * region, then we have already set dri2_has_hiz to true after >> - * confirming above that the stencil buffer is Y tiled. >> + * confirming above that the stencil buffer is W tiled. >> */ >> assert(0); >> } >> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers= /dri/intel/intel_fbo.c >> index 1669af2..507cc33 100644 >> --- a/src/mesa/drivers/dri/intel/intel_fbo.c >> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c >> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context= * ctx, struct gl_renderbuffer >=20 >> if (irb->Base.Format =3D=3D MESA_FORMAT_S8) { >> /* >> + * The stencil buffer is W tiled. However, we request from the = kernel a >> + * non-tiled buffer because the GTT is incapable of W fencing. >> + * >> * The stencil buffer has quirky pitch requirements. From Vol = 2a, >> * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch": >> * The pitch must be set to 2x the value computed based on w= idth, as >> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_conte= xt * ctx, struct gl_renderbuffer >> * To accomplish this, we resort to the nasty hack of doubling = the drm >> * region's cpp and halving its height. >> * >> - * If we neglect to double the pitch, then drm_intel_gem_bo_map= _gtt() >> - * maps the memory incorrectly. >> + * If we neglect to double the pitch, then render corruption oc= curs. >> */ >> irb->region =3D intel_region_alloc(intel->intelScreen, >> - I915_TILING_Y, >> + I915_TILING_NONE, >> cpp * 2, >> - width, >> - height / 2, >> + ALIGN(width, 64), >> + /* XXX: Maybe align to 128? */ >> + ALIGN(height / 2, 64), >> GL_TRUE); >> if (!irb->region) >> return false; >> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/driv= ers/dri/intel/intel_screen.h >> index b2013af..9dd6a52 100644 >> --- a/src/mesa/drivers/dri/intel/intel_screen.h >> +++ b/src/mesa/drivers/dri/intel/intel_screen.h >> @@ -63,9 +63,12 @@ >> * x8_z24 and s8). >> * >> * Eventually, intel_update_renderbuffers() makes a DRI2 request for >> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y= tiled, >> - * then we joyfully set intel_screen.dri2_has_hiz to true and continu= e as if >> - * nothing happend. >> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tilin= g is >> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_h= iz to >> + * true and continue as if nothing happend. >> + * >> + * [1] The stencil buffer is actually W tiled. However, we request fr= om the >> + * kernel a non-tiled buffer because the GTT is incapable of W fe= ncing. >> * >> * If the buffers are X tiled, however, the handshake has failed and = we must >> * clean up. >> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/driver= s/dri/intel/intel_span.c >> index 153803f..d306432 100644 >> --- a/src/mesa/drivers/dri/intel/intel_span.c >> +++ b/src/mesa/drivers/dri/intel/intel_span.c >> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *i= ntel, >> int miny =3D 0; \ >> int maxx =3D rb->Width; \ >> int maxy =3D rb->Height; \ >> - int stride =3D rb->RowStride; \ >> - uint8_t *buf =3D rb->Data; \ >> + \ >> + /* \ >> + * Here we ignore rb->Data and rb->RowStride as set by \ >> + * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile = \ >> + * manually, the region's *real* base address and stride is \ >> + * required. \ >> + */ \ >> + struct intel_renderbuffer *irb =3D intel_renderbuffer(rb); \ >> + uint8_t *buf =3D irb->region->buffer->virtual; \ >> + unsigned stride =3D irb->region->pitch; \ >> + unsigned height =3D 2 * irb->region->height; \ >> + bool flip =3D rb->Name =3D=3D 0; \ >=20 >> -/* Don't flip y. */ >> #undef Y_FLIP >> -#define Y_FLIP(y) y >> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y)) >=20 >> /** >> * \brief Get pointer offset into stencil buffer. >> * >> - * The stencil buffer interleaves two rows into one. Yay for crazy ha= rdware. >> - * The table below demonstrates how the pointer arithmetic behaves fo= r a buffer >> - * with positive stride (s=3Dstride). >> - * >> - * x | y | byte offset >> - * -------------------------- >> - * 0 | 0 | 0 >> - * 0 | 1 | 1 >> - * 1 | 0 | 2 >> - * 1 | 1 | 3 >> - * ... | ... | ... >> - * 0 | 2 | s >> - * 0 | 3 | s + 1 >> - * 1 | 2 | s + 2 >> - * 1 | 3 | s + 3=20 >> - * >> + * The stencil buffer is W tiled. Since the GTT is incapable of W fen= cing, we >> + * must decode the tile's layout in software. >> * >> + * See >> + * - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Ma= jor Tile >> + * Format. >> + * - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling= Algorithm >> */ >> -static inline intptr_t >> -intel_offset_S8(int stride, GLint x, GLint y) >> +static inline uintptr_t >=20 > Why the type change? Two reasons. 1. I redeclared the parameters as unsigned so to generate better code. Since x, y, and stride are unsigned, the division and modulo operators generate shift and bit-logic instructions rather than the slower arithmet= ic instructions. Below is a comparison of x % 64, signed versus unsigned, compiled with gc= c -O3. In f, the % generates 5 instructions. In g, only one instruction. int f(int x) { return x % 64; } f: movl %edi, %edx sarl $31, %edx shrl $26, %edx leal (%rdi,%rdx), %eax andl $63, %eax subl %edx, %eax ret unsigned g(unsigned x) { return x % 64); g: movl %edi, %eax andl $63, %eax ret 2. I redeclared the return type as unsigned to make it explicit that it d= oes not return a negative offset. >> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y) >> { >> - return 2 * ((y / 2) * stride + x) + y % 2; >> + uint32_t tile_size =3D 4096; >> + uint32_t tile_width =3D 64; >> + uint32_t tile_height =3D 64; >> + uint32_t row_size =3D 64 * stride; >> + >> + uint32_t tile_x =3D x / tile_width; >> + uint32_t tile_y =3D y / tile_height; >> + >> + /* The byte's address relative to the tile's base addres. */ >> + uint32_t byte_x =3D x % tile_width; >> + uint32_t byte_y =3D y % tile_height; >> + >> + uintptr_t u =3D tile_y * row_size >> + + tile_x * tile_size >> + + 512 * (byte_x / 8) >> + + 64 * (byte_y / 8) >> + + 32 * ((byte_y / 4) % 2) >> + + 16 * ((byte_x / 4) % 2) >> + + 8 * ((byte_y / 2) % 2) >> + + 4 * ((byte_x / 2) % 2) >> + + 2 * (byte_y % 2) >> + + 1 * (byte_x % 2); >> + >> + /* >> + * Errata for Gen5: >> + * >> + * An additional offset is needed which is not documented in the P= RM. >> + * >> + * if ((byte_x / 8) % 2 =3D=3D 1) { >> + * if ((byte_y / 8) % 2) =3D=3D 0) { >> + * u +=3D 64; >> + * } else { >> + * u -=3D 64; >> + * } >> + * } >> + * >> + * The offset is expressed more tersely as >> + * u +=3D ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1)); >> + */ >> + >> + return u; >> } >=20 >> #define WRITE_STENCIL(x, y, src) buf[intel_offset_S8(stride, x, y)] = =3D src; --=20 Chad Versace chad@chad-versace.us --------------enig55C83AF4AF91BF312F7E4965 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJOJJ2WAAoJEAIvNt057x8ib6EP/RyBjuSXpQoY1RbhCYHTyXAa KzY9/MbisQpY3DbFahhsbb6S8fjlvRfc+whjvGHyifMJQeKdrgaPRSxYLre2hr4A IoSa24PFX0VJRtE4ydJZag6MUsloceLx57RJBwQzmzNDi0b5DE1I5hL6eXnVWu38 TBbmT5pDCv9rHe/wXZo2UEtNgL2ZcE5Ua8SJkIQrnDgl2h46MgiAEr759g30sM+Z JKTfksci9R6XiAyrrK6jAHAOA9uCY4ZShr4Xssb2DWe/eQVbTpO4YUyqSujhNvlV gFDC3qxkqEbRPAGKjzQqTmZvKRCtMhBBflPvAZx3DRSkiuYqKDmk89VxEMgJbiaL F7qiuQCXLWB/3Ppht5HgRSYh7fflb+z5uWX5Se8vQyvdxylA/l4O8bc1SxZwkWRj pXE+fyzovBap/dLa4qmJmIavjD5CYAubjfnhGyoRhKonncwlL/k7u+g6CQPuh/ad j6fQ0Cb/qjRhz0NXn3c6Su6wV+HJTHMcQMB+eYaAgK3YhSgr2MxqGELxd1PjmU3e CDU1mE1dcuvWFqtEUtRLeFpl8JWK7w0vIbiGO6fnvnoGy6Tnho7aa2TX6WCLI/Ce CrSmBR9OJYf+EGKfB+kbCAsSsweyK2dj+HLzePen5AoA5wmYR4e1UEMUo/Wd0mwr diuaB0RagPXsOeGRxbeq =fF3V -----END PGP SIGNATURE----- --------------enig55C83AF4AF91BF312F7E4965-- --===============2066311384== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx --===============2066311384==--