Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* hsw rps values regress RPS on Macbook Air
@ 2012-10-09 20:05 Eric Anholt
  2012-10-11 19:55 ` Jesse Barnes
  2012-10-16 15:17 ` Chris Wilson
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Anholt @ 2012-10-09 20:05 UTC (permalink / raw)
  To: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 850 bytes --]

On my new MBA with danvet's drm-intel-next-queued, I'm not getting
working RPS.  vblank_mode=0 glxgears never ups the frequency, and
vblank_mode=0 openarena only makes it up to 500mhz.  Reverting
1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c gets the machine to responsive
RPS: fully on while the GPU is busy, fully lowered when it's not.

Since we're always just looking for all-on or all-off and never see
workloads that actually want to be somewhere in between, could we please
just move to race to idle for RPS?

dmi info:

Handle 0x001B, DMI type 1, 27 bytes
System Information
        Manufacturer: Apple Inc.
        Product Name: MacBookAir5,2
        Version: 1.0
        Serial Number: C02JH14TDRVG
        UUID: E58AA5BB-95BE-115D-AE6F-E12B41830529
        Wake-up Type: Power Switch
        SKU Number: System SKU#
        Family: MacBook Air

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-09 20:05 hsw rps values regress RPS on Macbook Air Eric Anholt
@ 2012-10-11 19:55 ` Jesse Barnes
  2012-10-12 18:34   ` Eric Anholt
  2012-10-16 15:17 ` Chris Wilson
  1 sibling, 1 reply; 10+ messages in thread
From: Jesse Barnes @ 2012-10-11 19:55 UTC (permalink / raw)
  To: Eric Anholt; +Cc: intel-gfx

On Tue, 09 Oct 2012 13:05:54 -0700
Eric Anholt <eric@anholt.net> wrote:

> On my new MBA with danvet's drm-intel-next-queued, I'm not getting
> working RPS.  vblank_mode=0 glxgears never ups the frequency, and
> vblank_mode=0 openarena only makes it up to 500mhz.  Reverting
> 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c gets the machine to responsive
> RPS: fully on while the GPU is busy, fully lowered when it's not.
> 
> Since we're always just looking for all-on or all-off and never see
> workloads that actually want to be somewhere in between, could we please
> just move to race to idle for RPS?

Ramping to the max freq is fine for benchmarking.  But for normal
vblank throttled activity, using the lowest freq (assuming it's
above our nominal freq) that can hit the refresh is the right answer
from a power perspective.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-11 19:55 ` Jesse Barnes
@ 2012-10-12 18:34   ` Eric Anholt
  2012-10-16 13:53     ` Jesse Barnes
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Anholt @ 2012-10-12 18:34 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1028 bytes --]

Jesse Barnes <jbarnes@virtuousgeek.org> writes:

> On Tue, 09 Oct 2012 13:05:54 -0700
> Eric Anholt <eric@anholt.net> wrote:
>
>> On my new MBA with danvet's drm-intel-next-queued, I'm not getting
>> working RPS.  vblank_mode=0 glxgears never ups the frequency, and
>> vblank_mode=0 openarena only makes it up to 500mhz.  Reverting
>> 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c gets the machine to responsive
>> RPS: fully on while the GPU is busy, fully lowered when it's not.
>> 
>> Since we're always just looking for all-on or all-off and never see
>> workloads that actually want to be somewhere in between, could we please
>> just move to race to idle for RPS?
>
> Ramping to the max freq is fine for benchmarking.  But for normal
> vblank throttled activity, using the lowest freq (assuming it's
> above our nominal freq) that can hit the refresh is the right answer
> from a power perspective.

Have you seen any workloads where a middle frequency value is actually
chosen by the current RPS system?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-12 18:34   ` Eric Anholt
@ 2012-10-16 13:53     ` Jesse Barnes
  2012-10-16 14:38       ` Daniel Vetter
  2012-10-16 19:50       ` Eric Anholt
  0 siblings, 2 replies; 10+ messages in thread
From: Jesse Barnes @ 2012-10-16 13:53 UTC (permalink / raw)
  To: Eric Anholt; +Cc: intel-gfx

On Fri, 12 Oct 2012 11:34:08 -0700
Eric Anholt <eric@anholt.net> wrote:

> Jesse Barnes <jbarnes@virtuousgeek.org> writes:
> 
> > On Tue, 09 Oct 2012 13:05:54 -0700
> > Eric Anholt <eric@anholt.net> wrote:
> >
> >> On my new MBA with danvet's drm-intel-next-queued, I'm not getting
> >> working RPS.  vblank_mode=0 glxgears never ups the frequency, and
> >> vblank_mode=0 openarena only makes it up to 500mhz.  Reverting
> >> 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c gets the machine to responsive
> >> RPS: fully on while the GPU is busy, fully lowered when it's not.
> >> 
> >> Since we're always just looking for all-on or all-off and never see
> >> workloads that actually want to be somewhere in between, could we please
> >> just move to race to idle for RPS?
> >
> > Ramping to the max freq is fine for benchmarking.  But for normal
> > vblank throttled activity, using the lowest freq (assuming it's
> > above our nominal freq) that can hit the refresh is the right answer
> > from a power perspective.
> 
> Have you seen any workloads where a middle frequency value is actually
> chosen by the current RPS system?

I can't tell if this is a snarky response or not. :)  But either way it
misses my point: I think the current RPS system isn't ideal for many of
our workloads and the way our GL stack runs things.  I've thought we
could do better for awhile now but couldn't think of a way that would
let userspace request lower frequencies if it didn't need the extra
processing power, but if we collect a little data in Mesa maybe we can
do it.

I propose a new ioctl, I915_FREQ_REQUEST, with 3 different parameters,
I915_MAX_FREQ, I915_MORE_FREQ, and I915_LESS_FREQ.  The first would
tell the kernel the app would like to run at the maximum possible
speed, regardless of power or throttling considerations.  MORE
would simply tell the kernel the app needs a higher frequency to meet
its frame rate target, and LESS would tell the kernel it could run
slower and still hit its target.

In Mesa, we'd need to track the FPS target for the app, the current FPS
(e.g. over the last second, or using a decaying average with some
weight toward recent activity), and the time between swapbuffers calls
(as an approximation of how long it takes us to draw each frame).

Periodically (maybe every second when we update our current FPS), Mesa
would either request more frequency if it wasn't hitting its FPS
target, or less frequency if its frame draw time was less than 90% of
the maximum alloted frame time (the period for the frequency we're
trying to hit).  The FPS target would be based on the swap interval for
the app.

In a benchmarking mode (i.e. vblank_mode=0 or swapinterval set to 0),
we could just make a I915_MAX_FREQ request and be done with it.

Within the kernel, we'd evaluate every app's requests and choose the
max frequency requested, re-setting things on every ioctl call and when
apps close.

Any thoughts?  Would collecting the above info in Mesa be pretty easy?
I think we already collect FPS info if certain debug flags are set, and
frame time seems like a pretty trivial calculation based on some
timestamping in a couple of places...

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-16 13:53     ` Jesse Barnes
@ 2012-10-16 14:38       ` Daniel Vetter
  2012-10-16 14:53         ` Jesse Barnes
  2012-10-16 19:50       ` Eric Anholt
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Vetter @ 2012-10-16 14:38 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx

On Tue, Oct 16, 2012 at 3:53 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> Any thoughts?  Would collecting the above info in Mesa be pretty easy?
> I think we already collect FPS info if certain debug flags are set, and
> frame time seems like a pretty trivial calculation based on some
> timestamping in a couple of places...

How does this work with 2 and more clients rendering to the gpu, each
pretty much oblivious to what the others are doing (i.e. your
composited desktop case)? And if that works, how does it figure out
that the 25 fps video client indeed doesn't want to draw more than
that on a 50 fps screen (which is presumably the frame target)? Note
that the 25 fps case is very tricky, since a lot of workloads with too
little parrallelism between the gpu and cpu and hence lots of idle
time on both run fastest if you lock both gpu and cpu to the max. And
since 2d flips a lot between hw/sw rendering, your mostly idle desktop
is a prime example of this.

Also, I totally don't see how that differs from what the hw does, safe
that we don't know the exact algo the hw implements: Instead of the hw
generating up/down request userspace does the same. And the pin to max
is already implemented in sysfs now. The other issue is that userspace
individually lacks the global overview.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-16 14:38       ` Daniel Vetter
@ 2012-10-16 14:53         ` Jesse Barnes
  0 siblings, 0 replies; 10+ messages in thread
From: Jesse Barnes @ 2012-10-16 14:53 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, 16 Oct 2012 16:38:02 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Oct 16, 2012 at 3:53 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> > Any thoughts?  Would collecting the above info in Mesa be pretty easy?
> > I think we already collect FPS info if certain debug flags are set, and
> > frame time seems like a pretty trivial calculation based on some
> > timestamping in a couple of places...
> 
> How does this work with 2 and more clients rendering to the gpu, each
> pretty much oblivious to what the others are doing (i.e. your
> composited desktop case)? And if that works, how does it figure out
> that the 25 fps video client indeed doesn't want to draw more than
> that on a 50 fps screen (which is presumably the frame target)? Note

How does what figure it out?  A client that wants to draw at 25fps
would use 25fps as its target framerate, not 50.  And it wouldn't
request more GPU frequency unless it wasn't hitting that target.  But
the media stack isn't something I've considered much.  I think we can
use the same approach there in general, but things get a little trickier
with encoding and transcoding, since you don't necessarily want to run
flat out all the time, but you also don't have an FPS target unless
you're doing realtime transcode for camera->display or something.

> that the 25 fps case is very tricky, since a lot of workloads with too
> little parrallelism between the gpu and cpu and hence lots of idle
> time on both run fastest if you lock both gpu and cpu to the max. And
> since 2d flips a lot between hw/sw rendering, your mostly idle desktop
> is a prime example of this.

We need the GPU to go as fast as the most demanding client requires, or
in the case of multiple competing clients, as fast as is needed to
drive them all at their target framerate.  The GL stack should take
care of making requests to make this happen based on what I outlined.

> Also, I totally don't see how that differs from what the hw does, safe
> that we don't know the exact algo the hw implements: Instead of the hw
> generating up/down request userspace does the same. And the pin to max
> is already implemented in sysfs now. The other issue is that userspace
> individually lacks the global overview.

Only the kernel has the global view.  But each client will know if it's
hitting its target or not, and can request more frequency if not (which
could be due to its own demands or that other stuff is taking some GPU
too).

The big difference between this and the hw mechanism is that here we're
basing our frequency requests on what we actually need (the target
framerate).  The hw currently has no idea about this, and can't really,
unless we tell it.  So for example the current stuff will increase the
GPU frequency beyond nominal when an app runs, even if it's hitting its
target framerate, which is less efficient than just leaving the freq at
nominal, which is what this would do.

Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-09 20:05 hsw rps values regress RPS on Macbook Air Eric Anholt
  2012-10-11 19:55 ` Jesse Barnes
@ 2012-10-16 15:17 ` Chris Wilson
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2012-10-16 15:17 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Tue, 09 Oct 2012 13:05:54 -0700, Eric Anholt <eric@anholt.net> wrote:
> On my new MBA with danvet's drm-intel-next-queued, I'm not getting
> working RPS.  vblank_mode=0 glxgears never ups the frequency, and
> vblank_mode=0 openarena only makes it up to 500mhz.  Reverting
> 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c gets the machine to responsive
> RPS: fully on while the GPU is busy, fully lowered when it's not.

I can confirm this. The issue is that whilst a GL client is active we
never seen a subsequent RPS interrupt, neither up nor down. In
particular, it is the value of UP_EI that throws us off, even though
we do not use the EI mode for determing RPS interrupts.

diff --git a/drivers/gpu/drm/i915/intel_pm.c
b/drivers/gpu/drm/i915/intel_pm.c
index 6b2ea80..4e5fc33 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -2492,7 +2492,7 @@ static void gen6_enable_rps(struct drm_device *dev)
 
        I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
        I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
-       I915_WRITE(GEN6_RP_UP_EI, 66000);
+       I915_WRITE(GEN6_RP_UP_EI, 100000);
        I915_WRITE(GEN6_RP_DOWN_EI, 350000);
 
        I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);

> Since we're always just looking for all-on or all-off and never see
> workloads that actually want to be somewhere in between, could we please
> just move to race to idle for RPS?

I believe that is more or less the purpose of the AGGRESSIVE_TURBO policy
that is enabled by default, but as with anything to do with RPS the
absence of documentation is remarkable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-16 13:53     ` Jesse Barnes
  2012-10-16 14:38       ` Daniel Vetter
@ 2012-10-16 19:50       ` Eric Anholt
  2012-10-16 20:15         ` Jesse Barnes
  2012-10-16 20:55         ` Jesse Barnes
  1 sibling, 2 replies; 10+ messages in thread
From: Eric Anholt @ 2012-10-16 19:50 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5811 bytes --]

Jesse Barnes <jbarnes@virtuousgeek.org> writes:

> On Fri, 12 Oct 2012 11:34:08 -0700
> Eric Anholt <eric@anholt.net> wrote:
>
>> Jesse Barnes <jbarnes@virtuousgeek.org> writes:
>> 
>> > On Tue, 09 Oct 2012 13:05:54 -0700
>> > Eric Anholt <eric@anholt.net> wrote:
>> >
>> >> On my new MBA with danvet's drm-intel-next-queued, I'm not getting
>> >> working RPS.  vblank_mode=0 glxgears never ups the frequency, and
>> >> vblank_mode=0 openarena only makes it up to 500mhz.  Reverting
>> >> 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c gets the machine to responsive
>> >> RPS: fully on while the GPU is busy, fully lowered when it's not.
>> >> 
>> >> Since we're always just looking for all-on or all-off and never see
>> >> workloads that actually want to be somewhere in between, could we please
>> >> just move to race to idle for RPS?
>> >
>> > Ramping to the max freq is fine for benchmarking.  But for normal
>> > vblank throttled activity, using the lowest freq (assuming it's
>> > above our nominal freq) that can hit the refresh is the right answer
>> > from a power perspective.
>> 
>> Have you seen any workloads where a middle frequency value is actually
>> chosen by the current RPS system?
>
> I can't tell if this is a snarky response or not. :)  But either way it
> misses my point: I think the current RPS system isn't ideal for many of
> our workloads and the way our GL stack runs things.  I've thought we
> could do better for awhile now but couldn't think of a way that would
> let userspace request lower frequencies if it didn't need the extra
> processing power, but if we collect a little data in Mesa maybe we can
> do it.

It's not snarky, I'm really wondering if you've actually seen middle
frequencies like this software is designed to do.  I spend a lot of time
looking at performance, and whenever I look at RPS state, it's either at
highest or lowest, or not working.  I've never seen a functioning
workload that stays at the middle.  Testing with the hsw values
reverted:

openarena vblank-synced at 60fps (can do 240), my clock bounces between
350 and 1150.

nexuiz vblank-synced isn't hitting 60 here, and the clock isn't getting
all the way to 1150 (saw I think 900 and 1100 a bunch), and the CPU
isn't the bottleneck as measured by top.  So this is the only case of
these 3 that's actually choosing a middle frequency, but it shouldn't
be.

I even tried using glxgears vblank-synced, resizing my window.  As I
scale up, it seems to spend more time at 1150 instead of 350, but it's
not choosing something in the middle, even though it seems like the most
obvious workload for this middle frequency support.

I'd like to replace "not working" with "high when busy, low when not",
while you're saying that we have to support a middle frequency like the
complicated software is trying to achieve.

> I propose a new ioctl, I915_FREQ_REQUEST, with 3 different parameters,
> I915_MAX_FREQ, I915_MORE_FREQ, and I915_LESS_FREQ.  The first would
> tell the kernel the app would like to run at the maximum possible
> speed, regardless of power or throttling considerations.  MORE
> would simply tell the kernel the app needs a higher frequency to meet
> its frame rate target, and LESS would tell the kernel it could run
> slower and still hit its target.
>
> In Mesa, we'd need to track the FPS target for the app, the current FPS
> (e.g. over the last second, or using a decaying average with some
> weight toward recent activity), and the time between swapbuffers calls
> (as an approximation of how long it takes us to draw each frame).
>
> Periodically (maybe every second when we update our current FPS), Mesa
> would either request more frequency if it wasn't hitting its FPS
> target, or less frequency if its frame draw time was less than 90% of
> the maximum alloted frame time (the period for the frequency we're
> trying to hit).  The FPS target would be based on the swap interval for
> the app.
>
> In a benchmarking mode (i.e. vblank_mode=0 or swapinterval set to 0),
> we could just make a I915_MAX_FREQ request and be done with it.
>
> Within the kernel, we'd evaluate every app's requests and choose the
> max frequency requested, re-setting things on every ioctl call and when
> apps close.
>
> Any thoughts?  Would collecting the above info in Mesa be pretty easy?
> I think we already collect FPS info if certain debug flags are set, and
> frame time seems like a pretty trivial calculation based on some
> timestamping in a couple of places...

Unfortunately, enough apps don't use swap interval, and instead of use
SGI_video_sync or OML_sync_control.  In that case, we don't know the
swap interval outside of a blocking call, unless we look at their
history and try to guess.  It sounds ugly, and I guess we'd basically
end up with I915_MAX_FREQ as our policy.

The design is also predicated on some bad assumptions.  One is that
frame-to-frame workloads stay consistent.  3x difference in work between
high and low-framerate scenes within an app I'd say is normal, and you'd
need to be able to recognize that change and fix the frequency within
half a second in the worst case I'd think.  Think about your compositor,
too: right now it's updating a character at a time as I type, then I go
hit the expose button and it has to redraw the whole screen and that's a
waaay different workload.  I want responsiveness.

The other bad assumption I think is that there's a bunch headroom for us
to reduce the frequency.  Games are tuned to the hardware, to be able to
barely hit 60fps -- if you're way over 60, then either the app turns on
more pretty graphics options or you do.  You don't have a bunch of extra
space to play with turning down the frequency.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-16 19:50       ` Eric Anholt
@ 2012-10-16 20:15         ` Jesse Barnes
  2012-10-16 20:55         ` Jesse Barnes
  1 sibling, 0 replies; 10+ messages in thread
From: Jesse Barnes @ 2012-10-16 20:15 UTC (permalink / raw)
  To: Eric Anholt; +Cc: intel-gfx

On Tue, 16 Oct 2012 12:50:22 -0700
Eric Anholt <eric@anholt.net> wrote:
> I'd like to replace "not working" with "high when busy, low when not",
> while you're saying that we have to support a middle frequency like the
> complicated software is trying to achieve.

Well I think this is pretty simple; much simpler than the underlying hw
system at least, and comparable to our sw support of it.

But for frequencies, you're correct.  We should run at the minimum
frequency we can that will still achieve our target framerate.

> Unfortunately, enough apps don't use swap interval, and instead of use
> SGI_video_sync or OML_sync_control.  In that case, we don't know the
> swap interval outside of a blocking call, unless we look at their
> history and try to guess.  It sounds ugly, and I guess we'd basically
> end up with I915_MAX_FREQ as our policy.

Don't all apps have a default swap interval at least?  If they're doing
additional blocking beyond that we'd lose the target fps information, I
agree.

> The design is also predicated on some bad assumptions.  One is that
> frame-to-frame workloads stay consistent.  3x difference in work between
> high and low-framerate scenes within an app I'd say is normal, and you'd
> need to be able to recognize that change and fix the frequency within
> half a second in the worst case I'd think.  Think about your compositor,
> too: right now it's updating a character at a time as I type, then I go
> hit the expose button and it has to redraw the whole screen and that's a
> waaay different workload.  I want responsiveness.

I don't want to assume constant work at all; I know scenes vary widely
in complexity over time.  If we timestamp the execution of buffers we
send in, we can do very fine grained requests if we notice they're
starting to take longer.

> The other bad assumption I think is that there's a bunch headroom for us
> to reduce the frequency.  Games are tuned to the hardware, to be able to
> barely hit 60fps -- if you're way over 60, then either the app turns on
> more pretty graphics options or you do.  You don't have a bunch of extra
> space to play with turning down the frequency.

Yeah, games are generally demanding.  But we also have desktops and
phone UIs that aren't, and those are pretty common cases too.

Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hsw rps values regress RPS on Macbook Air
  2012-10-16 19:50       ` Eric Anholt
  2012-10-16 20:15         ` Jesse Barnes
@ 2012-10-16 20:55         ` Jesse Barnes
  1 sibling, 0 replies; 10+ messages in thread
From: Jesse Barnes @ 2012-10-16 20:55 UTC (permalink / raw)
  To: Eric Anholt; +Cc: intel-gfx

On Tue, 16 Oct 2012 12:50:22 -0700
Eric Anholt <eric@anholt.net> wrote:
> It's not snarky, I'm really wondering if you've actually seen middle
> frequencies like this software is designed to do.  I spend a lot of time
> looking at performance, and whenever I look at RPS state, it's either at
> highest or lowest, or not working.  I've never seen a functioning
> workload that stays at the middle.  Testing with the hsw values
> reverted:

Lemme step back here a minute too, since some people have had strong
reactions to this proposal.

My underlying assumption here is that we're not executing with as much
power efficiency as we could.  This is more the case with apps (desktop
environments, non-gaming apps) that don't stress the GPU much, but may
also be true for some games we can run faster than 60fps.

In an ideal world, we'd run everything at the minimum frequency
required to get it to whatever target framerate it has (60fps for
games, 24fps or whatever for video, or some variable number for a
compositor that has intermittent activity).  We'd do that because even
though we take more time to draw a frame, keeping the frequency low is
more efficient from a power perspective, so over a given time period we
end up consuming fewer joules than if we "race to idle" at the highest
frequency.

Unfortunately, achieving that ideal is difficult.  Today's
implementation doesn't take into account any fps target, and uses some
sw programmed weight values to figure out when to trigger a frequency
increase or decrease.  The values we have today came from data
collection done with the Windows driver, which uses a different gfx
stack and has very different behavior from ours.  We should probably
tune them based on our own workloads, on a per-platform basis.

But given that we don't give the hw any information about when we need
things done or whether we're doing what we want to do, I thought we
might be able to do better with some additional sw assistance.

It's possible we could combine the proposal I sent out earlier (which
boils down to userspace telling the kernel whether it's getting the
performance it wants or not) with the hw-assisted mechanism, along with
some additional tuning, or we could do something else entirely.

Anyway, that's the background and I probably should have sent it out
first.

Jesse

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-10-16 20:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-09 20:05 hsw rps values regress RPS on Macbook Air Eric Anholt
2012-10-11 19:55 ` Jesse Barnes
2012-10-12 18:34   ` Eric Anholt
2012-10-16 13:53     ` Jesse Barnes
2012-10-16 14:38       ` Daniel Vetter
2012-10-16 14:53         ` Jesse Barnes
2012-10-16 19:50       ` Eric Anholt
2012-10-16 20:15         ` Jesse Barnes
2012-10-16 20:55         ` Jesse Barnes
2012-10-16 15:17 ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox