* [PATCH] drm: Explicitly compute the last cacheline for clflush on range
@ 2015-10-16 19:55 Chris Wilson
2015-10-17 20:03 ` Imre Deak
0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2015-10-16 19:55 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter, dri-devel
Fixes regression from
commit afcd950cafea6e27b739fe7772cbbeed37d05b8b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jun 10 15:58:01 2015 +0100
drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
I'm stumped. Looking at the loop we should be iterating over every cache
line until we reach the start of the cacheline after the end of the
virtual range. Evidence says otherwise.
More bizarely, I stored the last address to be clflushed and found it to
be equal to the start of the cacheline containing the last byte. Doubly
purplexed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
Testcase: gem_tiled_partial_pwrite_pread/reads
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
drivers/gpu/drm/drm_cache.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 6743ff7dccfa..7c909bc8b68a 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -131,10 +131,13 @@ drm_clflush_virt_range(void *addr, unsigned long length)
#if defined(CONFIG_X86)
if (cpu_has_clflush) {
const int size = boot_cpu_data.x86_clflush_size;
- void *end = addr + length;
- addr = (void *)(((unsigned long)addr) & -size);
+ void *end;
+
+ end = (void *)(((unsigned long)addr + length - 1) & -size);
+ addr = (void *)((unsigned long)addr & -size);
+
mb();
- for (; addr < end; addr += size)
+ for (; addr <= end; addr += size)
clflushopt(addr);
mb();
return;
--
2.6.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] drm: Explicitly compute the last cacheline for clflush on range
2015-10-16 19:55 [PATCH] drm: Explicitly compute the last cacheline for clflush on range Chris Wilson
@ 2015-10-17 20:03 ` Imre Deak
2015-10-18 12:28 ` Chris Wilson
0 siblings, 1 reply; 6+ messages in thread
From: Imre Deak @ 2015-10-17 20:03 UTC (permalink / raw)
To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, dri-devel
On Fri, 2015-10-16 at 20:55 +0100, Chris Wilson wrote:
> Fixes regression from
>
> commit afcd950cafea6e27b739fe7772cbbeed37d05b8b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Wed Jun 10 15:58:01 2015 +0100
>
> drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
>
> I'm stumped. Looking at the loop we should be iterating over every cache
> line until we reach the start of the cacheline after the end of the
> virtual range. Evidence says otherwise.
>
> More bizarely, I stored the last address to be clflushed and found it to
> be equal to the start of the cacheline containing the last byte. Doubly
> purplexed.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
> Testcase: gem_tiled_partial_pwrite_pread/reads
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/drm_cache.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index 6743ff7dccfa..7c909bc8b68a 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -131,10 +131,13 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> #if defined(CONFIG_X86)
> if (cpu_has_clflush) {
> const int size = boot_cpu_data.x86_clflush_size;
> - void *end = addr + length;
> - addr = (void *)(((unsigned long)addr) & -size);
> + void *end;
> +
> + end = (void *)(((unsigned long)addr + length - 1) & -size);
> + addr = (void *)((unsigned long)addr & -size);
> +
> mb();
> - for (; addr < end; addr += size)
> + for (; addr <= end; addr += size)
Hm, I can't see how could this make any difference. The old way still
looks ok to me and the new version would flush the exact same cache
lines as the old one using the same addresses (beginning of each cache
line).
> clflushopt(addr);
> mb();
> return;
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm: Explicitly compute the last cacheline for clflush on range
2015-10-17 20:03 ` Imre Deak
@ 2015-10-18 12:28 ` Chris Wilson
2015-10-18 13:07 ` Chris Wilson
0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2015-10-18 12:28 UTC (permalink / raw)
To: Imre Deak; +Cc: Daniel Vetter, intel-gfx, dri-devel
On Sat, Oct 17, 2015 at 11:03:19PM +0300, Imre Deak wrote:
> On Fri, 2015-10-16 at 20:55 +0100, Chris Wilson wrote:
> > Fixes regression from
> >
> > commit afcd950cafea6e27b739fe7772cbbeed37d05b8b
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date: Wed Jun 10 15:58:01 2015 +0100
> >
> > drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
> >
> > I'm stumped. Looking at the loop we should be iterating over every cache
> > line until we reach the start of the cacheline after the end of the
> > virtual range. Evidence says otherwise.
> >
> > More bizarely, I stored the last address to be clflushed and found it to
> > be equal to the start of the cacheline containing the last byte. Doubly
> > purplexed.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
> > Testcase: gem_tiled_partial_pwrite_pread/reads
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Imre Deak <imre.deak@intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> > drivers/gpu/drm/drm_cache.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > index 6743ff7dccfa..7c909bc8b68a 100644
> > --- a/drivers/gpu/drm/drm_cache.c
> > +++ b/drivers/gpu/drm/drm_cache.c
> > @@ -131,10 +131,13 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> > #if defined(CONFIG_X86)
> > if (cpu_has_clflush) {
> > const int size = boot_cpu_data.x86_clflush_size;
> > - void *end = addr + length;
> > - addr = (void *)(((unsigned long)addr) & -size);
> > + void *end;
> > +
> > + end = (void *)(((unsigned long)addr + length - 1) & -size);
> > + addr = (void *)((unsigned long)addr & -size);
> > +
> > mb();
> > - for (; addr < end; addr += size)
> > + for (; addr <= end; addr += size)
>
> Hm, I can't see how could this make any difference. The old way still
> looks ok to me and the new version would flush the exact same cache
> lines as the old one using the same addresses (beginning of each cache
> line).
I couldn't spot the difference either. I am beginning to suspect it is
gcc as
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 6743ff7..c9097b5 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -130,11 +130,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
{
#if defined(CONFIG_X86)
if (cpu_has_clflush) {
const int size = boot_cpu_data.x86_clflush_size;
- void *end = addr + length;
+ void *end = addr + length - 1;
addr = (void *)(((unsigned long)addr) & -size);
mb();
- for (; addr < end; addr += size)
+ for (; addr <= end; addr += size)
clflushopt(addr);
mb();
return;
Also fixes gem_tiled_partial_pwrite (on byt and bsw).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] drm: Explicitly compute the last cacheline for clflush on range
2015-10-18 12:28 ` Chris Wilson
@ 2015-10-18 13:07 ` Chris Wilson
2015-10-18 16:07 ` [Intel-gfx] " Chris Wilson
0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2015-10-18 13:07 UTC (permalink / raw)
To: Imre Deak, intel-gfx, dri-devel, Daniel Vetter
On Sun, Oct 18, 2015 at 01:28:11PM +0100, Chris Wilson wrote:
> On Sat, Oct 17, 2015 at 11:03:19PM +0300, Imre Deak wrote:
> > On Fri, 2015-10-16 at 20:55 +0100, Chris Wilson wrote:
> > > Fixes regression from
> > >
> > > commit afcd950cafea6e27b739fe7772cbbeed37d05b8b
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date: Wed Jun 10 15:58:01 2015 +0100
> > >
> > > drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
> > >
> > > I'm stumped. Looking at the loop we should be iterating over every cache
> > > line until we reach the start of the cacheline after the end of the
> > > virtual range. Evidence says otherwise.
> > >
> > > More bizarely, I stored the last address to be clflushed and found it to
> > > be equal to the start of the cacheline containing the last byte. Doubly
> > > purplexed.
> > >
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
> > > Testcase: gem_tiled_partial_pwrite_pread/reads
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Imre Deak <imre.deak@intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ---
> > > drivers/gpu/drm/drm_cache.c | 9 ++++++---
> > > 1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > > index 6743ff7dccfa..7c909bc8b68a 100644
> > > --- a/drivers/gpu/drm/drm_cache.c
> > > +++ b/drivers/gpu/drm/drm_cache.c
> > > @@ -131,10 +131,13 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> > > #if defined(CONFIG_X86)
> > > if (cpu_has_clflush) {
> > > const int size = boot_cpu_data.x86_clflush_size;
> > > - void *end = addr + length;
> > > - addr = (void *)(((unsigned long)addr) & -size);
> > > + void *end;
> > > +
> > > + end = (void *)(((unsigned long)addr + length - 1) & -size);
> > > + addr = (void *)((unsigned long)addr & -size);
> > > +
> > > mb();
> > > - for (; addr < end; addr += size)
> > > + for (; addr <= end; addr += size)
> >
> > Hm, I can't see how could this make any difference. The old way still
> > looks ok to me and the new version would flush the exact same cache
> > lines as the old one using the same addresses (beginning of each cache
> > line).
>
> I couldn't spot the difference either. I am beginning to suspect it is
> gcc as
>
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index 6743ff7..c9097b5 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -130,11 +130,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> {
> #if defined(CONFIG_X86)
> if (cpu_has_clflush) {
> const int size = boot_cpu_data.x86_clflush_size;
> - void *end = addr + length;
> + void *end = addr + length - 1;
> addr = (void *)(((unsigned long)addr) & -size);
> mb();
> - for (; addr < end; addr += size)
> + for (; addr <= end; addr += size)
> clflushopt(addr);
> mb();
> return;
s/clflushopt/clflush/ works just as well.
Plot thickens. Current guess is that gcc doesn't see the constraints
underneath the alternative()?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm: Explicitly compute the last cacheline for clflush on range
2015-10-18 13:07 ` Chris Wilson
@ 2015-10-18 16:07 ` Chris Wilson
2015-10-19 8:35 ` Daniel Vetter
0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2015-10-18 16:07 UTC (permalink / raw)
To: Imre Deak, intel-gfx, dri-devel, Daniel Vetter
On Sun, Oct 18, 2015 at 02:07:13PM +0100, Chris Wilson wrote:
> > I couldn't spot the difference either. I am beginning to suspect it is
> > gcc as
> >
> > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > index 6743ff7..c9097b5 100644
> > --- a/drivers/gpu/drm/drm_cache.c
> > +++ b/drivers/gpu/drm/drm_cache.c
> > @@ -130,11 +130,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> > {
> > #if defined(CONFIG_X86)
> > if (cpu_has_clflush) {
> > const int size = boot_cpu_data.x86_clflush_size;
> > - void *end = addr + length;
> > + void *end = addr + length - 1;
> > addr = (void *)(((unsigned long)addr) & -size);
> > mb();
> > - for (; addr < end; addr += size)
> > + for (; addr <= end; addr += size)
> > clflushopt(addr);
> > mb();
> > return;
>
> s/clflushopt/clflush/ works just as well.
>
> Plot thickens. Current guess is that gcc doesn't see the constraints
> underneath the alternative()?
Adding a barrier() after clflushopt() in the loop is sufficient as well.
Almost certain that alternative() is confusing gcc.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-gfx] [PATCH] drm: Explicitly compute the last cacheline for clflush on range
2015-10-18 16:07 ` [Intel-gfx] " Chris Wilson
@ 2015-10-19 8:35 ` Daniel Vetter
0 siblings, 0 replies; 6+ messages in thread
From: Daniel Vetter @ 2015-10-19 8:35 UTC (permalink / raw)
To: Chris Wilson, Imre Deak, intel-gfx, dri-devel, Daniel Vetter
On Sun, Oct 18, 2015 at 05:07:06PM +0100, Chris Wilson wrote:
> On Sun, Oct 18, 2015 at 02:07:13PM +0100, Chris Wilson wrote:
> > > I couldn't spot the difference either. I am beginning to suspect it is
> > > gcc as
> > >
> > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > > index 6743ff7..c9097b5 100644
> > > --- a/drivers/gpu/drm/drm_cache.c
> > > +++ b/drivers/gpu/drm/drm_cache.c
> > > @@ -130,11 +130,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> > > {
> > > #if defined(CONFIG_X86)
> > > if (cpu_has_clflush) {
> > > const int size = boot_cpu_data.x86_clflush_size;
> > > - void *end = addr + length;
> > > + void *end = addr + length - 1;
> > > addr = (void *)(((unsigned long)addr) & -size);
> > > mb();
> > > - for (; addr < end; addr += size)
> > > + for (; addr <= end; addr += size)
> > > clflushopt(addr);
> > > mb();
> > > return;
> >
> > s/clflushopt/clflush/ works just as well.
> >
> > Plot thickens. Current guess is that gcc doesn't see the constraints
> > underneath the alternative()?
>
> Adding a barrier() after clflushopt() in the loop is sufficient as well.
> Almost certain that alternative() is confusing gcc.
So adding that barrier() to clflushopt with a massive comment that gcc
gets confused?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-10-19 8:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-16 19:55 [PATCH] drm: Explicitly compute the last cacheline for clflush on range Chris Wilson
2015-10-17 20:03 ` Imre Deak
2015-10-18 12:28 ` Chris Wilson
2015-10-18 13:07 ` Chris Wilson
2015-10-18 16:07 ` [Intel-gfx] " Chris Wilson
2015-10-19 8:35 ` Daniel Vetter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox