From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
intel-gfx@lists.freedesktop.org,
Akash Goel <akash.goel@intel.com>,
Mika Kuoppala <mika.kuoppala@intel.com>
Subject: Re: [PATCH] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory
Date: Mon, 18 Jul 2016 13:07:28 +0100 [thread overview]
Message-ID: <578CC680.80206@linux.intel.com> (raw)
In-Reply-To: <20160718113501.GH21839@nuc-i3427.alporthouse.com>
On 18/07/16 12:35, Chris Wilson wrote:
> On Mon, Jul 18, 2016 at 12:15:32PM +0100, Tvrtko Ursulin wrote:
>> I am not sure about this, but looking at the raid6 for example, it
>> has a lot more annotations in cases like this.
>>
>> It seems to be telling the compiler which memory ranges does each
>> instruction access, and also uses "asm volatile" - whether or not
>> that is really needed I don't know.
>>
>> For example:
>> asm volatile("movdqa %0,%%xmm4" :: "m" (dptr[z0][d]));
>>
>> And:
>> asm volatile("movdqa %%xmm4,%0" : "=m" (q[d]));
>>
>> Each one is telling the compiler the instruction is either reading
>> or writing respectively from a certain memory address.
>>
>> You don't have any of that, and don't even specify nothing as an
>> output parameter so I am not sure if your code is safe.
>
> The asm is correct. We do not modify either of the two pointers which we
> pass in via register inputs, but the memory behind them - hence the memory
> clobber.
So you are saying memory clobber is like a big hammer telling the
compiler you are modifying some memory and it is not allowed to assume
or remove anything?
There would be no benefit in being more specific, like the RAID code does?
>>> +void i915_memcpy_init_early(struct drm_i915_private *dev_priv)
>>> +{
>>> + if (static_cpu_has(X86_FEATURE_XMM4_1))
>>> + static_branch_enable(&has_movntdqa);
>>> +}
>>>
>>
>> I was not familiar with static key stuff and the only thing I can
>> notice is that it is used very little throughout the kernel. On the
>> other hand I haven't found any references in the documentation that
>> it should be used sparingly or something.
>>
>> But the general question would be - is it worth it here? Static
>> branches should be really efficient in the off case, correct? And we
>> don't really care about the performance of the off case here. So
>> would it be just as good to use a normal branch?
>
> It's not the cost of the branch, but the static_cpu_has() in comparison
> to a small copy.
Could cache it in dev_priv. Not saying you should, just asking about
pros and cons of the two approaches vs the amount of code. Well not even
the amount of code, just the fact static keys seem to be used so little
so wondering why.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-07-18 12:07 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-16 12:07 [PATCH] drm/i915: Use SSE4.1 movntqda to accelerate reads from WC memory Chris Wilson
2016-07-16 12:32 ` ✗ Ro.CI.BAT: failure for " Patchwork
2016-07-16 12:33 ` [PATCH] " Chris Wilson
2016-07-16 12:45 ` [PATCH v2] " Chris Wilson
2016-07-16 12:53 ` [PATCH v3] " Chris Wilson
2016-07-16 15:44 ` [PATCH v5] drm/i915: Use SSE4.1 movntdqa " Chris Wilson
2016-07-17 3:21 ` Matt Turner
2016-07-17 8:12 ` Chris Wilson
2016-07-18 9:31 ` Tvrtko Ursulin
2016-07-18 10:01 ` Chris Wilson
2016-07-18 10:07 ` [PATCH] " Chris Wilson
2016-07-18 10:29 ` Chris Wilson
2016-07-18 11:15 ` Tvrtko Ursulin
2016-07-18 11:35 ` Chris Wilson
2016-07-18 11:57 ` Dave Gordon
2016-07-18 12:56 ` Tvrtko Ursulin
2016-07-18 13:46 ` Tvrtko Ursulin
2016-07-18 15:06 ` Tvrtko Ursulin
2016-07-18 16:05 ` Dave Gordon
2016-07-19 10:26 ` Tvrtko Ursulin
2016-07-18 13:48 ` Dave Gordon
2016-07-18 12:07 ` Tvrtko Ursulin [this message]
2016-07-19 6:50 ` Daniel Vetter
2016-07-16 13:12 ` ✗ Ro.CI.BAT: failure for drm/i915: Use SSE4.1 movntqda to accelerate reads from WC memory (rev2) Patchwork
2016-07-16 13:34 ` ✓ Ro.CI.BAT: success for drm/i915: Use SSE4.1 movntqda to accelerate reads from WC memory (rev3) Patchwork
2016-07-16 16:12 ` ✗ Ro.CI.BAT: failure for drm/i915: Use SSE4.1 movntqda to accelerate reads from WC memory (rev4) Patchwork
2016-07-18 10:59 ` ✗ Ro.CI.BAT: failure for drm/i915: Use SSE4.1 movntqda to accelerate reads from WC memory (rev5) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=578CC680.80206@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=akash.goel@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mika.kuoppala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox