From: Peter Zijlstra <peterz@infradead.org>
To: Venkatesh Pallipadi <venki@google.com>
Cc: Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
linux-kernel@vger.kernel.org, Paul Turner <pjt@google.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
Len Brown <len.brown@intel.com>
Subject: Re: [PATCH 3/7] Add IRQ_TIME_ACCOUNTING, finer accounting of irq time -v3
Date: Sat, 02 Oct 2010 12:53:55 +0200 [thread overview]
Message-ID: <1286016835.2144.86.camel@laptop> (raw)
In-Reply-To: <AANLkTikcPX6m01F1rg--AKkSj_69kJJVsTrqB_FF6sJf@mail.gmail.com>
On Fri, 2010-10-01 at 16:32 -0700, Venkatesh Pallipadi wrote:
> On Fri, Oct 1, 2010 at 4:14 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, 2010-10-01 at 10:29 -0700, Venkatesh Pallipadi wrote:
> >> So, on x86, sched_clock_stable is not set on all other kind of CPUs
> >> and my test system happens to be one of them. So, sched_clock_cpu()
> >> falls back to tick based even when TSC is not marked unstable and
> >> clocksource is using TSC for timing.
> >
> > It is never tick based!! It's tick augmented! Because TSC is such a
> > piece of crap we use external (slow) means of determining a window in
> > which the TSC should live and then use the TSC to generate high
> > resolution offsets inside that.
> >
> > So even if your usage is in the hardirq context that moves that window
> > it should all work out.
> >
>
> You mean there should not be any "jumps" noticed with
> sched_clock_cpu() when we are idle and get a interrupt?
> Atleast thats what I am seeing. May be there is some other bug
> somewhere causing that.
>
> Loooking at one snapshot from my earlier log
>
> <idle>-0 [] 1697.915040: : START 1700899887146
> // We were idle and got an interrupt and recorded sched_clock_cpu() as
> 1700899887146
> <idle>-0 [] 1697.915047: : HARD STOP 1700902008678, delta 2121532
> // We finished handling the interrupt and recorded sched_clock_cpu()
> as 1700902008678
> // So, delta we see is > 2ms
> // This is trace_printk based on local clock, which is using sched_clock()
> // So, the trace timing shows delta of 7 us, which is kind of expected time here
Egads, yes that would be a kernel/sched_clock.c buglet..
So we're in NOHZ and an IRQ/NMI happens that ends up calling
sched_clock_cpu() and friends without us leaving NOHZ.
drivers/acpi/processor_idle.c:acpi_idle_enter_simple() calls
sched_clock_idle_{sleep,wakeup}_event() around the idle loop -- are
there idle methods missing this, and or do we handle the interrupt
before the wakeup event?
Also, in irq_enter() we call __irq_enter() which does
account_system_vtime() before tick_check_idle() which restarts various
timers and resets jiffies and in fact already calls
sched_clock_idle_wakeup_event().
Gah what a mess.. we could try a code shuffle to restart timer/clock
bits before calling into account_system_vtime(), although I bet that'll
be interesting. But I see no way to fix the NMI during NOHZ problem, its
not like we can actually do GTOD from NMI context :/
The thing is, I really do _NOT_ trust TSC to be sane enough to use like
you want to do, its really proven itself to be reliably crap.
next prev parent reply other threads:[~2010-10-02 10:54 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-29 19:21 Proper kernel irq time accounting -v3 Venkatesh Pallipadi
2010-09-29 19:21 ` [PATCH 1/7] si time accounting accounts bh_disable'd time to si -v3 Venkatesh Pallipadi
2010-09-30 11:04 ` Peter Zijlstra
2010-09-30 16:26 ` Venkatesh Pallipadi
2010-10-01 23:16 ` Peter Zijlstra
2010-10-02 15:42 ` Venkatesh Pallipadi
2010-10-03 0:34 ` Peter Zijlstra
2010-10-04 16:54 ` Venkatesh Pallipadi
2010-09-29 19:21 ` [PATCH 2/7] Consolidate account_system_vtime extern declaration -v3 Venkatesh Pallipadi
2010-09-29 19:21 ` [PATCH 3/7] Add IRQ_TIME_ACCOUNTING, finer accounting of irq time -v3 Venkatesh Pallipadi
2010-09-30 11:06 ` Peter Zijlstra
2010-09-30 16:29 ` Venkatesh Pallipadi
2010-09-30 20:38 ` Venkatesh Pallipadi
2010-10-01 11:46 ` Peter Zijlstra
2010-10-01 16:51 ` Venkatesh Pallipadi
2010-10-01 17:29 ` Venkatesh Pallipadi
2010-10-01 23:14 ` Peter Zijlstra
2010-10-01 23:32 ` Venkatesh Pallipadi
2010-10-02 10:53 ` Peter Zijlstra [this message]
2010-10-02 15:26 ` Venkatesh Pallipadi
2010-10-03 0:26 ` Peter Zijlstra
2010-10-01 11:45 ` Peter Zijlstra
2010-09-29 19:21 ` [PATCH 4/7] x86: Add IRQ_TIME_ACCOUNTING in x86 -v3 Venkatesh Pallipadi
2010-09-29 19:21 ` [PATCH 5/7] sched: Do not account irq time to current task -v3 Venkatesh Pallipadi
2010-09-29 19:21 ` [PATCH 6/7] sched: Remove irq time from available CPU power -v3 Venkatesh Pallipadi
2010-09-29 19:21 ` [PATCH 7/7] Export per cpu hardirq and softirq time in proc -v3 Venkatesh Pallipadi
2010-09-30 7:59 ` Proper kernel irq time accounting -v3 Andi Kleen
2010-09-30 16:37 ` Venkatesh Pallipadi
2010-09-30 17:36 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1286016835.2144.86.camel@laptop \
--to=peterz@infradead.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=eric.dumazet@gmail.com \
--cc=hpa@zytor.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=pjt@google.com \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=venki@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.