From: Ingo Molnar <mingo@elte.hu>
To: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: mingo@redhat.com, David Dillow <dillowda@ornl.gov>,
linux-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
tglx@linutronix.de, Jens Axboe <jens.axboe@oracle.com>,
nigel@suspend2.net
Subject: Re: CONFIG_NO_HZ breaks blktrace timestamps
Date: Fri, 11 Jan 2008 10:55:34 +0000 [thread overview]
Message-ID: <20080111105534.GC1589@elte.hu> (raw)
In-Reply-To: <20080111114132.084036f2@cheypa.inria.fr>
* Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> David Dillow <dillowda@ornl.gov> wrote:
>
> > Patched kernel, nohz=off:
> > .clock_underflows : 213887
>
> A little bit of warning about these patches, they are WIP, that's why
> I did not send them earlier. It regress nohz=off.
ok. I have applied all but this one:
> A bit of context: these patches aim at making sure cpu_clock() on my
> laptop (cpufreq enabled) never overflows/underflows/warps with
> CONFIG_NOHZ enabled. With these patches, I have a few hundreds
> overflows and underflows during early bootup, and then nothing :-)
cool :-)
> > sched: Fix rq->clock overflows detection with CONFIG_NO_HZ
>
> I think this one is the most important for David, but unfortunately it
> has some problems.
>
> > +static inline u64 max_skipped_ticks(struct rq *rq)
> > +{
> > + return nohz_on(cpu_of(rq)) ? jiffies - rq->last_tick_seen + 2 : 1;
> > +}
>
> Here, I initially wrote rq->last_tick_seen + 1 but experiments showed
> that +2 was needed as I really saw deltas of 2 milliseconds.
>
> These patches have two objectives:
> - taking into account that jiffies are not always incremented by 1
> thanks to nohz
> - as the tick is stopped and restarted it may not tick at the exact
> expected moment, so allow a window of 1 jiffie. If the tick occurs
> during the right jiffy, we know the TSC is more precise than the tick
> so don't correct the clock.
i think it's much simpler to do what i have below. Could you try it on
your box? Or if it is using ACPI idle - in that case the callbacks
should already be there and there should be no need for further fixups.
> And the problem is that I seem to need a window of 2 jiffies, so I need
> some help.
>
> > sched: make sure jiffies is up to date before calling __update_rq_clock()
>
> This is one is needed too but I'm less confident in its validity.
>
> > scheduler_tick() is not called every jiffies
>
> This one is a bit ugly and seems to break nohz=off.
ok, i took this one out.
Ingo
-------------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>
do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.
(the ACPI idle code already does this.)
[ update the 64-bit side too, as noticed by Jiri Slaby. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/process_32.c | 15 ++++++++++++---
arch/x86/kernel/process_64.c | 13 ++++++++++---
2 files changed, 22 insertions(+), 6 deletions(-)
Index: linux-x86.q/arch/x86/kernel/process_32.c
=================================--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
smp_mb();
local_irq_disable();
- if (!need_resched())
+ if (!need_resched()) {
+ ktime_t t0, t1;
+ u64 t0n, t1n;
+
+ t0 = ktime_get();
+ t0n = ktime_to_ns(t0);
safe_halt(); /* enables interrupts racelessly */
- else
- local_irq_enable();
+ local_irq_disable();
+ t1 = ktime_get();
+ t1n = ktime_to_ns(t1);
+ sched_clock_idle_wakeup_event(t1n - t0n);
+ }
+ local_irq_enable();
current_thread_info()->status |= TS_POLLING;
} else {
/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
=================================--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
smp_mb();
local_irq_disable();
if (!need_resched()) {
- /* Enables interrupts one instruction before HLT.
- x86 special cases this so there is no race. */
- safe_halt();
+ ktime_t t0, t1;
+ u64 t0n, t1n;
+
+ t0 = ktime_get();
+ t0n = ktime_to_ns(t0);
+ safe_halt(); /* enables interrupts racelessly */
+ local_irq_disable();
+ t1 = ktime_get();
+ t1n = ktime_to_ns(t1);
+ sched_clock_idle_wakeup_event(t1n - t0n);
} else
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@elte.hu>
To: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: mingo@redhat.com, David Dillow <dillowda@ornl.gov>,
linux-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
tglx@linutronix.de, Jens Axboe <jens.axboe@oracle.com>,
nigel@suspend2.net
Subject: Re: CONFIG_NO_HZ breaks blktrace timestamps
Date: Fri, 11 Jan 2008 11:55:34 +0100 [thread overview]
Message-ID: <20080111105534.GC1589@elte.hu> (raw)
In-Reply-To: <20080111114132.084036f2@cheypa.inria.fr>
* Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> David Dillow <dillowda@ornl.gov> wrote:
>
> > Patched kernel, nohz=off:
> > .clock_underflows : 213887
>
> A little bit of warning about these patches, they are WIP, that's why
> I did not send them earlier. It regress nohz=off.
ok. I have applied all but this one:
> A bit of context: these patches aim at making sure cpu_clock() on my
> laptop (cpufreq enabled) never overflows/underflows/warps with
> CONFIG_NOHZ enabled. With these patches, I have a few hundreds
> overflows and underflows during early bootup, and then nothing :-)
cool :-)
> > sched: Fix rq->clock overflows detection with CONFIG_NO_HZ
>
> I think this one is the most important for David, but unfortunately it
> has some problems.
>
> > +static inline u64 max_skipped_ticks(struct rq *rq)
> > +{
> > + return nohz_on(cpu_of(rq)) ? jiffies - rq->last_tick_seen + 2 : 1;
> > +}
>
> Here, I initially wrote rq->last_tick_seen + 1 but experiments showed
> that +2 was needed as I really saw deltas of 2 milliseconds.
>
> These patches have two objectives:
> - taking into account that jiffies are not always incremented by 1
> thanks to nohz
> - as the tick is stopped and restarted it may not tick at the exact
> expected moment, so allow a window of 1 jiffie. If the tick occurs
> during the right jiffy, we know the TSC is more precise than the tick
> so don't correct the clock.
i think it's much simpler to do what i have below. Could you try it on
your box? Or if it is using ACPI idle - in that case the callbacks
should already be there and there should be no need for further fixups.
> And the problem is that I seem to need a window of 2 jiffies, so I need
> some help.
>
> > sched: make sure jiffies is up to date before calling __update_rq_clock()
>
> This is one is needed too but I'm less confident in its validity.
>
> > scheduler_tick() is not called every jiffies
>
> This one is a bit ugly and seems to break nohz=off.
ok, i took this one out.
Ingo
-------------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>
do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.
(the ACPI idle code already does this.)
[ update the 64-bit side too, as noticed by Jiri Slaby. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/process_32.c | 15 ++++++++++++---
arch/x86/kernel/process_64.c | 13 ++++++++++---
2 files changed, 22 insertions(+), 6 deletions(-)
Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
smp_mb();
local_irq_disable();
- if (!need_resched())
+ if (!need_resched()) {
+ ktime_t t0, t1;
+ u64 t0n, t1n;
+
+ t0 = ktime_get();
+ t0n = ktime_to_ns(t0);
safe_halt(); /* enables interrupts racelessly */
- else
- local_irq_enable();
+ local_irq_disable();
+ t1 = ktime_get();
+ t1n = ktime_to_ns(t1);
+ sched_clock_idle_wakeup_event(t1n - t0n);
+ }
+ local_irq_enable();
current_thread_info()->status |= TS_POLLING;
} else {
/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
smp_mb();
local_irq_disable();
if (!need_resched()) {
- /* Enables interrupts one instruction before HLT.
- x86 special cases this so there is no race. */
- safe_halt();
+ ktime_t t0, t1;
+ u64 t0n, t1n;
+
+ t0 = ktime_get();
+ t0n = ktime_to_ns(t0);
+ safe_halt(); /* enables interrupts racelessly */
+ local_irq_disable();
+ t1 = ktime_get();
+ t1n = ktime_to_ns(t1);
+ sched_clock_idle_wakeup_event(t1n - t0n);
} else
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
next prev parent reply other threads:[~2008-01-11 10:55 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-09 22:48 CONFIG_NO_HZ breaks blktrace timestamps David Dillow
2008-01-09 22:48 ` David Dillow
2008-01-10 20:25 ` David Dillow
2008-01-10 20:25 ` David Dillow
2008-01-10 22:44 ` Guillaume Chazarain
2008-01-10 22:44 ` Guillaume Chazarain
2008-01-11 3:04 ` David Dillow
2008-01-11 3:04 ` David Dillow
2008-01-11 9:07 ` Jens Axboe
2008-01-11 9:07 ` Jens Axboe
2008-01-11 9:23 ` Ingo Molnar
2008-01-11 9:23 ` Ingo Molnar
2008-01-11 9:25 ` Jens Axboe
2008-01-11 9:25 ` Jens Axboe
2008-01-11 9:42 ` Ingo Molnar
2008-01-11 9:42 ` Ingo Molnar
2008-01-11 9:56 ` Jens Axboe
2008-01-11 9:56 ` Jens Axboe
2008-01-11 10:29 ` [patch] block: fix " Ingo Molnar
2008-01-11 10:29 ` Ingo Molnar
2008-01-11 10:47 ` Guillaume Chazarain
2008-01-11 10:47 ` Guillaume Chazarain
2008-01-11 10:50 ` Ingo Molnar
2008-01-11 10:50 ` Ingo Molnar
2008-01-11 12:28 ` Jens Axboe
2008-01-11 12:28 ` Jens Axboe
2008-01-11 12:42 ` Jens Axboe
2008-01-11 12:42 ` Jens Axboe
2008-01-11 13:21 ` Ingo Molnar
2008-01-11 13:21 ` Ingo Molnar
2008-01-11 17:18 ` Jens Axboe
2008-01-11 17:18 ` Jens Axboe
2008-01-14 7:51 ` Ingo Molnar
2008-01-14 7:51 ` Ingo Molnar
2008-01-14 7:59 ` Ingo Molnar
2008-01-14 7:59 ` Ingo Molnar
2008-01-14 8:10 ` Jens Axboe
2008-01-14 8:10 ` Jens Axboe
2008-01-14 8:39 ` Christoph Hellwig
2008-01-14 8:42 ` Jens Axboe
2008-01-14 8:42 ` Jens Axboe
2008-01-11 15:36 ` David Dillow
2008-01-11 15:36 ` David Dillow
2008-01-11 16:44 ` Ingo Molnar
2008-01-11 16:44 ` Ingo Molnar
2008-01-11 17:26 ` Jens Axboe
2008-01-11 17:26 ` Jens Axboe
2008-01-11 9:29 ` CONFIG_NO_HZ breaks " Ingo Molnar
2008-01-11 9:29 ` Ingo Molnar
2008-01-11 9:34 ` Jens Axboe
2008-01-11 9:34 ` Jens Axboe
2008-01-11 9:28 ` nigel
2008-01-11 9:28 ` nigel
2008-01-11 9:32 ` Ingo Molnar
2008-01-11 9:32 ` Ingo Molnar
2008-01-13 22:54 ` nigel
2008-01-13 22:54 ` nigel
2008-01-11 9:44 ` Ingo Molnar
2008-01-11 9:44 ` Ingo Molnar
2008-01-11 9:51 ` Ingo Molnar
2008-01-11 9:51 ` Ingo Molnar
2008-01-11 10:41 ` Guillaume Chazarain
2008-01-11 10:41 ` Guillaume Chazarain
2008-01-11 10:55 ` Ingo Molnar [this message]
2008-01-11 10:55 ` Ingo Molnar
2008-01-11 22:30 ` Guillaume Chazarain
2008-01-11 22:30 ` Guillaume Chazarain
2008-01-12 0:03 ` Guillaume Chazarain
2008-01-12 0:03 ` Guillaume Chazarain
2008-01-11 9:34 ` Ingo Molnar
2008-01-11 9:34 ` Ingo Molnar
2008-01-11 15:43 ` David Dillow
2008-01-11 15:43 ` David Dillow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080111105534.GC1589@elte.hu \
--to=mingo@elte.hu \
--cc=dillowda@ornl.gov \
--cc=guichaz@yahoo.fr \
--cc=jens.axboe@oracle.com \
--cc=linux-btrace@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nigel@suspend2.net \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.