From: Francisco Jerez <currojerez@riseup.net>
To: Zhigang Gong <zhigang.gong@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, Zhigang Gong <zhigang.gong@intel.com>
Subject: Re: [PATCH] drm/i915/hsw: enable atomic in L3 for some steppings.
Date: Mon, 05 Jan 2015 19:27:26 +0200 [thread overview]
Message-ID: <87h9w5gmtd.fsf@riseup.net> (raw)
In-Reply-To: <20150105023637.GA28045@ivb-gt2-rev4>
[-- Attachment #1.1.1: Type: text/plain, Size: 4272 bytes --]
Zhigang Gong <zhigang.gong@linux.intel.com> writes:
> On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote:
>> Zhigang Gong <zhigang.gong@intel.com> writes:
>>
>> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable
>> > move of cacheable global atomics to L3 is needed for GT3 D
>> > stepping.
>> >
>> > I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
>> > The atomics works fine in beignet. And it could get more than 10x performance
>> > improvement with some workload, for an example, the "splat" kernel in darktable,
>> > without this patch, it consumes 50 seconds in one large raw picture processing.
>> > But with this patch, the same process only takes less than 1 second.
>> >
>>
>> I tried this already (on HSW GT2 D as well) and I don't think it's
>> enough to get L3 atomics working reliably. Even though they did seem to
>> work OK at first glance I observed some corruption issues (e.g. atomic
>> writes not landing in system memory) when doing atomic writes to
>> contiguous (as in within the same cache-line) locations in memory. The
>> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit
>> mailing list some time ago exposes this IIRC, and probably a couple of
>> other tests too.
> Ok, I will find that case and have a try on my systems. I just tested all
> the atomic related OpenCL conformance test cases without any issues.
>
>>
>> Also this change is going to cause an instant lock-up anytime Mesa uses
>> atomics because Mesa doesn't change the default L3 way allocation for
>> the DC, which turns out to be 0 on HSW.
>
> This is another issue, IMHO, if the application wants to use atomics,
> it's better to allocate some L3 space for DC. Otherwise, it could
> never leverage the "atomics in L3 feature". Based on my test,
> the performance impact as huge as more than 10x for some workloads.
>
Sure, but this change alone will cause a regression (an irrecoverable
system hang) with current releases of Mesa.
I agree that Mesa should be fixed eventually to assign L3 space to the
DC for some workloads, but it doesn't seem like we have a satisfactory
API to do that right now -- The current mechanism used by Beignet
(poking the L3 control registers from the batch buffer) is slightly
concerning from a security point of view because it allows an arbitrary
userspace application to cause misrendering or impact the performance of
other clients (since the L3 config registers are not being saved and
restored as part of the context) and even crash the whole system. It
also doesn't look like it could be made to work with the hardware
command checker.
> Thanks,
> Zhigang Gong.
>
>>
>> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html
>>
>> > Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
>> > ---
>> > drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
>> > 1 file changed, 6 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> > index 7d99a9c..8a27802 100644
>> > --- a/drivers/gpu/drm/i915/intel_pm.c
>> > +++ b/drivers/gpu/drm/i915/intel_pm.c
>> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
>> >
>> > ilk_init_lp_watermarks(dev);
>> >
>> > - /* L3 caching of data atomics doesn't work -- disable it. */
>> > - I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> > - I915_WRITE(HSW_ROW_CHICKEN3,
>> > - _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> > + if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
>> > + /* L3 caching of data atomics doesn't work -- disable it. */
>> > + I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> > + I915_WRITE(HSW_ROW_CHICKEN3,
>> > + _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> > + }
>> >
>> > /* This is required by WaCatErrorRejectionIssue:hsw */
>> > I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
>> > --
>> > 1.8.3.2
>
>
>
>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[-- Attachment #1.2: Type: application/pgp-signature, Size: 212 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-01-05 17:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-05 1:05 [PATCH] drm/i915/hsw: enable atomic in L3 for some steppings Zhigang Gong
2015-01-05 3:03 ` Francisco Jerez
2015-01-05 2:36 ` Zhigang Gong
2015-01-05 17:27 ` Francisco Jerez [this message]
2015-01-06 8:01 ` Zhigang Gong
2015-01-06 14:19 ` Francisco Jerez
2015-01-05 7:05 ` shuang.he
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h9w5gmtd.fsf@riseup.net \
--to=currojerez@riseup.net \
--cc=intel-gfx@lists.freedesktop.org \
--cc=zhigang.gong@intel.com \
--cc=zhigang.gong@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox