* FPS performance increase when deliberately spinning the CPU with an unrelated task
@ 2010-10-23 12:02 Peter Clifton
2010-10-25 20:11 ` Jesse Barnes
2010-10-25 20:14 ` Eric Anholt
0 siblings, 2 replies; 7+ messages in thread
From: Peter Clifton @ 2010-10-23 12:02 UTC (permalink / raw)
To: intel-gfx
Hi guys,
This is something I've noted before, and I think Keith P replied with
some idea of what might be causing it, but I can't recall exactly. I
just thought I'd mention it again in case it struck a chord with anyone.
I'm running my app here, which is on a benchmark test, banging out
frames as fast as the poor thing can manage. It is not CPU bound (it is
using about 50% CPU).
I'm getting 12 fps.
Now I run a devious little test app, "loop", in parallel:
int main( int argc, char **argv )
{
while (1);
}
Re-run the benchmark and I get 19.2 fps. (NICE).
I suspect cpufreq scaling, so I swapped the ondemand governor for
performance.
Strangely:
pcjc2@pcjc2lap:/sys/devices/system/cpu/cpu1/cpufreq$ cat scaling_available_frequencies
2401000 2400000 1600000 800000
and I only get:
sudo cat cpuinfo_cur_freq
2400000
(Never mind)
Repeat setting for other core of Core2 Duo.
Now, without my "loop" program running, I get 17.6 fps right off.
WITH my "loop" program running, I get 18.2 fps.
I think Keith was thinking that there are some parts of the chipset
which are shared between the GPU and CPU (memory controllers?), and the
CPU entering a lower frequency state could have a detrimental effect on
the graphics throughput.
I know in heavy workloads the CPU is likely to be "a bit" busy, and
rendering will not be totally GPU bound, but it would seem like it is
eventually necessary to have some hook to bump the CPU frequency (or
chipset frequency?) when the GPU would make beneficial use of the extra
throughput.
This doesn't make sense if it is banging out 100fps, but for my stuff,
the GPU is struggling to make 5fps for some complex circuit boards. I'm
trying to address that from a geometry / rendering complexity point of
view, but also, I'd love to see my laptop being able to get the best out
of its hardware.
Perhaps we need to account for periods when the CPU has tasks idle
waiting for GPU operations which would be sped up by increasing some
chip power state.
I'm probably not up to coding this all, but if the idea sounds feasible,
I'd love to know, so I might be able to have a tinker with it.
Best regards,
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FPS performance increase when deliberately spinning the CPU with an unrelated task
2010-10-23 12:02 FPS performance increase when deliberately spinning the CPU with an unrelated task Peter Clifton
@ 2010-10-25 20:11 ` Jesse Barnes
2010-10-25 20:20 ` Jesse Barnes
2010-10-25 20:14 ` Eric Anholt
1 sibling, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2010-10-25 20:11 UTC (permalink / raw)
To: Peter Clifton; +Cc: intel-gfx
On Sat, 23 Oct 2010 13:02:35 +0100
Peter Clifton <pcjc2@cam.ac.uk> wrote:
> I think Keith was thinking that there are some parts of the chipset
> which are shared between the GPU and CPU (memory controllers?), and the
> CPU entering a lower frequency state could have a detrimental effect on
> the graphics throughput.
>
> I know in heavy workloads the CPU is likely to be "a bit" busy, and
> rendering will not be totally GPU bound, but it would seem like it is
> eventually necessary to have some hook to bump the CPU frequency (or
> chipset frequency?) when the GPU would make beneficial use of the extra
> throughput.
>
> This doesn't make sense if it is banging out 100fps, but for my stuff,
> the GPU is struggling to make 5fps for some complex circuit boards. I'm
> trying to address that from a geometry / rendering complexity point of
> view, but also, I'd love to see my laptop being able to get the best out
> of its hardware.
>
> Perhaps we need to account for periods when the CPU has tasks idle
> waiting for GPU operations which would be sped up by increasing some
> chip power state.
>
> I'm probably not up to coding this all, but if the idea sounds feasible,
> I'd love to know, so I might be able to have a tinker with it.
There are some bits in the GMCH to control memory behavior during CPU
C-states. Can you dump the 16 bits at MCHBAR address 0xf08? You
should be able to do that by doing I915_READ16(MCHBAR_MIRROR_BASE +
0xf08). Assuming bits 3:2 and 1:0 are nonzero, it may help to set them
all to 0. That will disable several memory related power saving
features while the CPU is in a deep sleep state.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FPS performance increase when deliberately spinning the CPU with an unrelated task
2010-10-23 12:02 FPS performance increase when deliberately spinning the CPU with an unrelated task Peter Clifton
2010-10-25 20:11 ` Jesse Barnes
@ 2010-10-25 20:14 ` Eric Anholt
2010-10-26 19:34 ` Peter Clifton
1 sibling, 1 reply; 7+ messages in thread
From: Eric Anholt @ 2010-10-25 20:14 UTC (permalink / raw)
To: Peter Clifton, intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 3049 bytes --]
On Sat, 23 Oct 2010 13:02:35 +0100, Peter Clifton <pcjc2@cam.ac.uk> wrote:
> Hi guys,
>
> This is something I've noted before, and I think Keith P replied with
> some idea of what might be causing it, but I can't recall exactly. I
> just thought I'd mention it again in case it struck a chord with anyone.
>
> I'm running my app here, which is on a benchmark test, banging out
> frames as fast as the poor thing can manage. It is not CPU bound (it is
> using about 50% CPU).
>
> I'm getting 12 fps.
>
> Now I run a devious little test app, "loop", in parallel:
>
> int main( int argc, char **argv )
> {
> while (1);
> }
>
>
> Re-run the benchmark and I get 19.2 fps. (NICE).
>
>
> I suspect cpufreq scaling, so I swapped the ondemand governor for
> performance.
>
> Strangely:
> pcjc2@pcjc2lap:/sys/devices/system/cpu/cpu1/cpufreq$ cat scaling_available_frequencies
> 2401000 2400000 1600000 800000
>
> and I only get:
> sudo cat cpuinfo_cur_freq
> 2400000
>
> (Never mind)
>
> Repeat setting for other core of Core2 Duo.
>
>
> Now, without my "loop" program running, I get 17.6 fps right off.
> WITH my "loop" program running, I get 18.2 fps.
>
> I think Keith was thinking that there are some parts of the chipset
> which are shared between the GPU and CPU (memory controllers?), and the
> CPU entering a lower frequency state could have a detrimental effect on
> the graphics throughput.
>
> I know in heavy workloads the CPU is likely to be "a bit" busy, and
> rendering will not be totally GPU bound, but it would seem like it is
> eventually necessary to have some hook to bump the CPU frequency (or
> chipset frequency?) when the GPU would make beneficial use of the extra
> throughput.
>
> This doesn't make sense if it is banging out 100fps, but for my stuff,
> the GPU is struggling to make 5fps for some complex circuit boards. I'm
> trying to address that from a geometry / rendering complexity point of
> view, but also, I'd love to see my laptop being able to get the best out
> of its hardware.
>
> Perhaps we need to account for periods when the CPU has tasks idle
> waiting for GPU operations which would be sped up by increasing some
> chip power state.
>
> I'm probably not up to coding this all, but if the idea sounds feasible,
> I'd love to know, so I might be able to have a tinker with it.
Instead of just watching frequency, maybe use powertop to watch CPU
C-states as well? I'd suspect those to have more impact on graphics
than CPU frequency, though the two should be related in terms of when
they're changed.
C-state drops don't appear to matter for performance here on Ironlake
with my test app, but things may be different for your hw. If C-state
reductions matter, I think there's supposed to be a way for the kernel
waits to declare how long they expect to wait (1/refresh, I'd say) so as
to not drop C-state if it won't pay off. We should be declaring that if
we can anyway, and it might help your workload.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FPS performance increase when deliberately spinning the CPU with an unrelated task
2010-10-25 20:11 ` Jesse Barnes
@ 2010-10-25 20:20 ` Jesse Barnes
2010-10-26 0:14 ` Peter Clifton
0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2010-10-25 20:20 UTC (permalink / raw)
To: Peter Clifton; +Cc: intel-gfx
On Mon, 25 Oct 2010 13:11:24 -0700
Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> On Sat, 23 Oct 2010 13:02:35 +0100
> Peter Clifton <pcjc2@cam.ac.uk> wrote:
> > I think Keith was thinking that there are some parts of the chipset
> > which are shared between the GPU and CPU (memory controllers?), and the
> > CPU entering a lower frequency state could have a detrimental effect on
> > the graphics throughput.
> >
> > I know in heavy workloads the CPU is likely to be "a bit" busy, and
> > rendering will not be totally GPU bound, but it would seem like it is
> > eventually necessary to have some hook to bump the CPU frequency (or
> > chipset frequency?) when the GPU would make beneficial use of the extra
> > throughput.
> >
> > This doesn't make sense if it is banging out 100fps, but for my stuff,
> > the GPU is struggling to make 5fps for some complex circuit boards. I'm
> > trying to address that from a geometry / rendering complexity point of
> > view, but also, I'd love to see my laptop being able to get the best out
> > of its hardware.
> >
> > Perhaps we need to account for periods when the CPU has tasks idle
> > waiting for GPU operations which would be sped up by increasing some
> > chip power state.
> >
> > I'm probably not up to coding this all, but if the idea sounds feasible,
> > I'd love to know, so I might be able to have a tinker with it.
>
> There are some bits in the GMCH to control memory behavior during CPU
> C-states. Can you dump the 16 bits at MCHBAR address 0xf08? You
> should be able to do that by doing I915_READ16(MCHBAR_MIRROR_BASE +
> 0xf08). Assuming bits 3:2 and 1:0 are nonzero, it may help to set them
> all to 0. That will disable several memory related power saving
> features while the CPU is in a deep sleep state.
Oh and bits 5:4 and bit 6 as well. Bit 6 controls whether memory stays
active in C2, and bits 5:4 control which memory shutdown features are
enabled in C2; clearing 5:4 will disable shutdown, as will bit 6.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FPS performance increase when deliberately spinning the CPU with an unrelated task
2010-10-25 20:20 ` Jesse Barnes
@ 2010-10-26 0:14 ` Peter Clifton
2010-10-26 0:26 ` Jesse Barnes
0 siblings, 1 reply; 7+ messages in thread
From: Peter Clifton @ 2010-10-26 0:14 UTC (permalink / raw)
To: Jesse Barnes; +Cc: intel-gfx
On Mon, 2010-10-25 at 13:20 -0700, Jesse Barnes wrote:
> On Mon, 25 Oct 2010 13:11:24 -0700
> Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
>
> > There are some bits in the GMCH to control memory behavior during CPU
> > C-states. Can you dump the 16 bits at MCHBAR address 0xf08? You
> > should be able to do that by doing I915_READ16(MCHBAR_MIRROR_BASE +
> > 0xf08). Assuming bits 3:2 and 1:0 are nonzero, it may help to set them
> > all to 0. That will disable several memory related power saving
> > features while the CPU is in a deep sleep state.
>
> Oh and bits 5:4 and bit 6 as well. Bit 6 controls whether memory stays
> active in C2, and bits 5:4 control which memory shutdown features are
> enabled in C2; clearing 5:4 will disable shutdown, as will bit 6.
Is there a document somewhere which describes those? I've downloaded the
GM45 datasheet, and it lists that address as "reserved".
FWIW. the answer is 0x730f
(I hijacked one of the debugfs functions with this:)
seq_printf(m, "PCJC2 DEBUG HACK MCHBAR_MIRROR_BASE + 0xf08: %04x\n",
I915_READ16(MCHBAR_MIRROR_BASE + 0xf08));
I trust that is correct?
Converting to bin:
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 1
^ ^ ^ ^ ^ ^ ^
Should I poke this 1? | |__| |__|__|__|
Already clear is ok?_____| |___ Will try these at 0
Just off the top of your head.. is there any way of poking these from
userspace without reloading kernel modules?
Disabling CPU frequency scaling is only half of the story, as it appears
keeping the CPU out of C4 is the main benefit. I was checking with
powertop when I was playing before. I don't see C2 on my machine, only
C0, C1 and C4 (the main resident state when not working).
I understand C4 is pretty rubbish for memory usage as it disabled bus
mastering is that what the graphics controller will require?
Allegedly:
Your CPU supports the following C-states : C1 C2 C3 C4 C5 C6
Your BIOS reports the following C-states : C1 C4
:(
I'll wait your suggestions based on the above before I try poking the
register.
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FPS performance increase when deliberately spinning the CPU with an unrelated task
2010-10-26 0:14 ` Peter Clifton
@ 2010-10-26 0:26 ` Jesse Barnes
0 siblings, 0 replies; 7+ messages in thread
From: Jesse Barnes @ 2010-10-26 0:26 UTC (permalink / raw)
To: Peter Clifton; +Cc: intel-gfx
On Tue, 26 Oct 2010 01:14:04 +0100
Peter Clifton <pcjc2@cam.ac.uk> wrote:
> On Mon, 2010-10-25 at 13:20 -0700, Jesse Barnes wrote:
> > On Mon, 25 Oct 2010 13:11:24 -0700
> > Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> >
> > > There are some bits in the GMCH to control memory behavior during CPU
> > > C-states. Can you dump the 16 bits at MCHBAR address 0xf08? You
> > > should be able to do that by doing I915_READ16(MCHBAR_MIRROR_BASE +
> > > 0xf08). Assuming bits 3:2 and 1:0 are nonzero, it may help to set them
> > > all to 0. That will disable several memory related power saving
> > > features while the CPU is in a deep sleep state.
> >
> > Oh and bits 5:4 and bit 6 as well. Bit 6 controls whether memory stays
> > active in C2, and bits 5:4 control which memory shutdown features are
> > enabled in C2; clearing 5:4 will disable shutdown, as will bit 6.
>
> Is there a document somewhere which describes those? I've downloaded the
> GM45 datasheet, and it lists that address as "reserved".
>
> FWIW. the answer is 0x730f
Ok so aggressive power saving is enabled for deep sleep states (makes
sense). Unfortunately these bits aren't documented (even internally
it's hard to find info on this range).
>
> (I hijacked one of the debugfs functions with this:)
>
> seq_printf(m, "PCJC2 DEBUG HACK MCHBAR_MIRROR_BASE + 0xf08: %04x\n",
> I915_READ16(MCHBAR_MIRROR_BASE + 0xf08));
>
> I trust that is correct?
>
> Converting to bin:
>
> 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
> 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 1
>
> ^ ^ ^ ^ ^ ^ ^
> Should I poke this 1? | |__| |__|__|__|
> Already clear is ok?_____| |___ Will try these at 0
>
>
> Just off the top of your head.. is there any way of poking these from
> userspace without reloading kernel modules?
>
>
> Disabling CPU frequency scaling is only half of the story, as it appears
> keeping the CPU out of C4 is the main benefit. I was checking with
> powertop when I was playing before. I don't see C2 on my machine, only
> C0, C1 and C4 (the main resident state when not working).
>
> I understand C4 is pretty rubbish for memory usage as it disabled bus
> mastering is that what the graphics controller will require?
>
> Allegedly:
>
> Your CPU supports the following C-states : C1 C2 C3 C4 C5 C6
> Your BIOS reports the following C-states : C1 C4
>
> :(
>
>
> I'll wait your suggestions based on the above before I try poking the
> register.
You should be able to use intel_reg_write from intel-gpu-tools to poke
in the new values.
If C4 is the real issue, you can limit your c-state by booting with
cpu.max_cstate= (or is it processor.max_cstate=) at startup; that
should affect the behavior of the c-state driver.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FPS performance increase when deliberately spinning the CPU with an unrelated task
2010-10-25 20:14 ` Eric Anholt
@ 2010-10-26 19:34 ` Peter Clifton
0 siblings, 0 replies; 7+ messages in thread
From: Peter Clifton @ 2010-10-26 19:34 UTC (permalink / raw)
To: intel-gfx@lists.freedesktop.org
On Mon, 2010-10-25 at 13:14 -0700, Eric Anholt wrote:
I'm taking a bet on memory bandwidth being the constraint for fill
limited applications. glxgears manages 59fps at full screen (minus title
bar), which is 1680x1024, giving a pixel rate of approx 100Mpix/second
(Ignoring over-fill)
Assuming 2 bytes per pixel (ignoring possibility of depth / stencil
writes), that is 200M bytes / second.
Now I don't know what the likely memory bandwidth of the GMCH is, but I
bet with scan out and other activity going on, we'll be pushing it quite
hard to sustain that framerate.
I'm targeting a desired ~15-20fps at nearly full-screen for my circuit
board design app, so I will need to be careful with my code!
This said, some top-range current cards from other manufacturers are
claiming fill-rates in 10s of BILLIONS per second, and memory bandwidth
of 10s up to ~100 GB/sec! (Although it comes with more than half the ram
I have in this laptop, and probably costs about 1/2 - 3/4 what the
laptop did).
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-10-26 19:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-23 12:02 FPS performance increase when deliberately spinning the CPU with an unrelated task Peter Clifton
2010-10-25 20:11 ` Jesse Barnes
2010-10-25 20:20 ` Jesse Barnes
2010-10-26 0:14 ` Peter Clifton
2010-10-26 0:26 ` Jesse Barnes
2010-10-25 20:14 ` Eric Anholt
2010-10-26 19:34 ` Peter Clifton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.