public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
To: Andi Kleen <ak@suse.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>,
	Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>,
	mingo@elte.hu, linux-kernel <linux-kernel@vger.kernel.org>,
	Rajesh Shah <rajesh.shah@intel.com>,
	John Stultz <johnstul@us.ibm.com>,
	Asit K Mallick <asit.k.mallick@intel.com>
Subject: Re: [RFC][PATCH] i386 x86-64 Eliminate Local APIC timer interrupt
Date: Mon, 2 May 2005 13:27:37 -0700	[thread overview]
Message-ID: <20050502132737.A7309@unix-os.sc.intel.com> (raw)
In-Reply-To: <20050502190850.GN7342@wotan.suse.de>; from ak@suse.de on Mon, May 02, 2005 at 09:08:50PM +0200

On Mon, May 02, 2005 at 09:08:50PM +0200, Andi Kleen wrote:
> I thought about it more and i really dislike the broadcast timer
> more and more. Zwanes point on creating a lot of contention
> on irq0 datastructures is also a very good one.
> 

Actually, as IRQ0 will become PER_CPU din broadcast, there are no irq0 lock 
contention. Only contention that can be there is in scheduler idle balancing,
which should not be an issue as the CPU is idle anyway.

> > Fully agree with you on the mess part :(. Few other options that we had 
> > thought about earlier:
> > - Have some sort of callbacks while entering/exiting C3, and hand manipulate 
> >   Local APIC timer counter to account for the time spent in C3 state. This is
> >   less intrusive change (affects only the system that has C3), but code starts 
> >   getting ugly once we have time spent in C3 exceed a jiffy and spans across
> >   multiple jiffies. And we have to have some execute some code to handle all
> >   the lost local APIC timer idle ticks (for the statistics part) and can 
> >   increase C3 wakeup latency higher.
> 
> It is a bit messy agreed, but no timer tick in idle has to do this
> anyways. And we need to communicate with the ACPI idle code even
> because we need to shorten delays artificially in lower sleep
> modi (e.g. in C1 you dont want to sleep for longer than a ms 
> before waking up and switching into C2) 
> 
> So given that we need this anyways (and I have it partly coded up
> already) I think that is the way to go. The no tick code has 
> to query the backing time in this case anyways (or rather use the TSC 
> instead which is local - and I hope is still accurate even after C3) 

Unfortunately no :(. TSC will also stop in C3. Myself and John are working on
another patch to fix TSC based gettimeofday() to handle this (atleast in UP 
case) It is almost impossible in SMP, as TSCs can go out of sync with C3 on SMP.

So, ACPI PM timer or HPET seem to be the only option for backing time.

> and fix the timer up. So the infrastructure is there already
> and the APIC problem can be handled in the same way.
>
> BTW can you confirm that the APIC timer frequency is stable
> over cpufreq changes on your x86-64 CPUs, or does this need
> to be handled too? 

I haven't seen any issues with cpufreq on APIC timer, as APIC timers run 
based on FSB.

> 
> The only drawback is that these systems will pretty much need
> no timer tick in idle then to be reliable - i had hoped
> to keep it an experimental option at the beginning; but I guess
> we can accelerate it a bit.
> 
> So I would propose to go with this variant.
> 
> What do you think?
> 

OK. I have some code that I had prototyped to fixup local APIC counts in 
presence of C3s. I guess we can work together and solve this one faster. 
I will send you the patch in a separate mail.


> > - Other simpler solution is to remove idle from cpu_usage_stat and use 
> >   (overall time - other accounted time) instead. This doesn't really solve
> >   the problem, but it is a workaround for all the code that depends on 
> >   proper idle statistics.
> 
> What code is that exactly?
> 
> In no idle tick I just do the same thing as s390 and accumulate any lost ticks
> up in a loop. However I have not done much measurements how affected
> the CPU time statistics are.
> 
> But when the choice is between better power saving and slightly
> less accurate statistics I will prefer better power saving any day - and
> the people who really need good statistics are always free to turn
> off the power saving, but I doubt that will happen often as long
> as the statistics are "good enough". e.g. we can be factor 10 worse
> and not be worse than 2.4 with HZ=100. That is a lot of play ground.


cpufreq_ondemand governor depends on the idle statistics. And due to the 
wrong idle statistics, the governor will keep the CPU frequency at maximum,
loosing all the power advantages of cpufreq. So, question is not a simple 
power savings against accurate atatistics. Accurate statistics is related
to power savings as well...

Thanks,
Venki




  reply	other threads:[~2005-05-02 20:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-30  0:26 [RFC][PATCH] i386 x86-64 Eliminate Local APIC timer interrupt Venkatesh Pallipadi
2005-04-30  0:46 ` Zwane Mwaikambo
2005-04-30  0:58   ` Zwane Mwaikambo
2005-04-30  1:13 ` Zwane Mwaikambo
2005-04-30  2:32 ` Andrew Morton
2005-05-02 16:38 ` Andi Kleen
2005-05-02 17:16   ` Venkatesh Pallipadi
2005-05-02 19:08     ` Andi Kleen
2005-05-02 20:27       ` Venkatesh Pallipadi [this message]
2005-05-03 14:17         ` Andi Kleen
2005-05-05  5:33       ` Len Brown
2005-05-05 12:19         ` Andi Kleen
2005-05-11 18:12           ` Tony Lindgren
2005-05-05 20:45 ` George Anzinger
  -- strict thread matches above, loose matches on Subject: below --
2005-04-30  2:43 Pallipadi, Venkatesh
2005-05-05  4:16 ` Len Brown
2005-05-05 12:03   ` Andi Kleen
2005-05-05 12:32   ` Maciej W. Rozycki
2005-04-30  2:55 Pallipadi, Venkatesh
2005-04-30  3:06 ` Andrew Morton
2005-05-02 21:19   ` Pavel Machek
2005-04-30 19:40 Protasevich, Natalie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050502132737.A7309@unix-os.sc.intel.com \
    --to=venkatesh.pallipadi@intel.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=asit.k.mallick@intel.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rajesh.shah@intel.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox