All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: mingo@redhat.com, David Dillow <dillowda@ornl.gov>,
	linux-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
	tglx@linutronix.de, Jens Axboe <jens.axboe@oracle.com>,
	nigel@suspend2.net
Subject: Re: CONFIG_NO_HZ breaks blktrace timestamps
Date: Fri, 11 Jan 2008 10:55:34 +0000	[thread overview]
Message-ID: <20080111105534.GC1589@elte.hu> (raw)
In-Reply-To: <20080111114132.084036f2@cheypa.inria.fr>


* Guillaume Chazarain <guichaz@yahoo.fr> wrote:

> David Dillow <dillowda@ornl.gov> wrote:
> 
> > Patched kernel, nohz=off:
> >   .clock_underflows              : 213887
> 
> A little bit of warning about these patches, they are WIP, that's why 
> I did not send them earlier. It regress nohz=off.

ok. I have applied all but this one:

> A bit of context: these patches aim at making sure cpu_clock() on my
> laptop (cpufreq enabled) never overflows/underflows/warps with
> CONFIG_NOHZ enabled. With these patches, I have a few hundreds
> overflows and underflows during early bootup, and then nothing :-)

cool :-)

> >     sched: Fix rq->clock overflows detection with CONFIG_NO_HZ
> 
> I think this one is the most important for David, but unfortunately it
> has some problems.
> 
> > +static inline u64 max_skipped_ticks(struct rq *rq)
> > +{
> > +	return nohz_on(cpu_of(rq)) ? jiffies - rq->last_tick_seen + 2 : 1;
> > +}
> 
> Here, I initially wrote rq->last_tick_seen + 1 but experiments showed
> that +2 was needed as I really saw deltas of 2 milliseconds.
> 
> These patches have two objectives:
>  - taking into account that jiffies are not always incremented by 1
> thanks to nohz
>  - as the tick is stopped and restarted it may not tick at the exact
> expected moment, so allow a window of 1 jiffie. If the tick occurs
> during the right jiffy, we know the TSC is more precise than the tick
> so don't correct the clock.

i think it's much simpler to do what i have below. Could you try it on 
your box? Or if it is using ACPI idle - in that case the callbacks 
should already be there and there should be no need for further fixups.

> And the problem is that I seem to need a window of 2 jiffies, so I need
> some help.
> 
> >     sched: make sure jiffies is up to date before calling __update_rq_clock()
> 
> This is one is needed too but I'm less confident in its validity.
> 
> >     scheduler_tick() is not called every jiffies
> 
> This one is a bit ugly and seems to break nohz=off.

ok, i took this one out.

	Ingo

-------------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
=================================--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
=================================--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@elte.hu>
To: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: mingo@redhat.com, David Dillow <dillowda@ornl.gov>,
	linux-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
	tglx@linutronix.de, Jens Axboe <jens.axboe@oracle.com>,
	nigel@suspend2.net
Subject: Re: CONFIG_NO_HZ breaks blktrace timestamps
Date: Fri, 11 Jan 2008 11:55:34 +0100	[thread overview]
Message-ID: <20080111105534.GC1589@elte.hu> (raw)
In-Reply-To: <20080111114132.084036f2@cheypa.inria.fr>


* Guillaume Chazarain <guichaz@yahoo.fr> wrote:

> David Dillow <dillowda@ornl.gov> wrote:
> 
> > Patched kernel, nohz=off:
> >   .clock_underflows              : 213887
> 
> A little bit of warning about these patches, they are WIP, that's why 
> I did not send them earlier. It regress nohz=off.

ok. I have applied all but this one:

> A bit of context: these patches aim at making sure cpu_clock() on my
> laptop (cpufreq enabled) never overflows/underflows/warps with
> CONFIG_NOHZ enabled. With these patches, I have a few hundreds
> overflows and underflows during early bootup, and then nothing :-)

cool :-)

> >     sched: Fix rq->clock overflows detection with CONFIG_NO_HZ
> 
> I think this one is the most important for David, but unfortunately it
> has some problems.
> 
> > +static inline u64 max_skipped_ticks(struct rq *rq)
> > +{
> > +	return nohz_on(cpu_of(rq)) ? jiffies - rq->last_tick_seen + 2 : 1;
> > +}
> 
> Here, I initially wrote rq->last_tick_seen + 1 but experiments showed
> that +2 was needed as I really saw deltas of 2 milliseconds.
> 
> These patches have two objectives:
>  - taking into account that jiffies are not always incremented by 1
> thanks to nohz
>  - as the tick is stopped and restarted it may not tick at the exact
> expected moment, so allow a window of 1 jiffie. If the tick occurs
> during the right jiffy, we know the TSC is more precise than the tick
> so don't correct the clock.

i think it's much simpler to do what i have below. Could you try it on 
your box? Or if it is using ACPI idle - in that case the callbacks 
should already be there and there should be no need for further fixups.

> And the problem is that I seem to need a window of 2 jiffies, so I need
> some help.
> 
> >     sched: make sure jiffies is up to date before calling __update_rq_clock()
> 
> This is one is needed too but I'm less confident in its validity.
> 
> >     scheduler_tick() is not called every jiffies
> 
> This one is a bit ugly and seems to break nohz=off.

ok, i took this one out.

	Ingo

-------------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;

  reply	other threads:[~2008-01-11 10:55 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 22:48 CONFIG_NO_HZ breaks blktrace timestamps David Dillow
2008-01-09 22:48 ` David Dillow
2008-01-10 20:25 ` David Dillow
2008-01-10 20:25   ` David Dillow
2008-01-10 22:44   ` Guillaume Chazarain
2008-01-10 22:44     ` Guillaume Chazarain
2008-01-11  3:04     ` David Dillow
2008-01-11  3:04       ` David Dillow
2008-01-11  9:07       ` Jens Axboe
2008-01-11  9:07         ` Jens Axboe
2008-01-11  9:23         ` Ingo Molnar
2008-01-11  9:23           ` Ingo Molnar
2008-01-11  9:25           ` Jens Axboe
2008-01-11  9:25             ` Jens Axboe
2008-01-11  9:42             ` Ingo Molnar
2008-01-11  9:42               ` Ingo Molnar
2008-01-11  9:56               ` Jens Axboe
2008-01-11  9:56                 ` Jens Axboe
2008-01-11 10:29                 ` [patch] block: fix " Ingo Molnar
2008-01-11 10:29                   ` Ingo Molnar
2008-01-11 10:47                   ` Guillaume Chazarain
2008-01-11 10:47                     ` Guillaume Chazarain
2008-01-11 10:50                     ` Ingo Molnar
2008-01-11 10:50                       ` Ingo Molnar
2008-01-11 12:28                   ` Jens Axboe
2008-01-11 12:28                     ` Jens Axboe
2008-01-11 12:42                     ` Jens Axboe
2008-01-11 12:42                       ` Jens Axboe
2008-01-11 13:21                     ` Ingo Molnar
2008-01-11 13:21                       ` Ingo Molnar
2008-01-11 17:18                       ` Jens Axboe
2008-01-11 17:18                         ` Jens Axboe
2008-01-14  7:51                         ` Ingo Molnar
2008-01-14  7:51                           ` Ingo Molnar
2008-01-14  7:59                           ` Ingo Molnar
2008-01-14  7:59                             ` Ingo Molnar
2008-01-14  8:10                             ` Jens Axboe
2008-01-14  8:10                               ` Jens Axboe
2008-01-14  8:39                           ` Christoph Hellwig
2008-01-14  8:42                             ` Jens Axboe
2008-01-14  8:42                               ` Jens Axboe
2008-01-11 15:36                   ` David Dillow
2008-01-11 15:36                     ` David Dillow
2008-01-11 16:44                     ` Ingo Molnar
2008-01-11 16:44                       ` Ingo Molnar
2008-01-11 17:26                       ` Jens Axboe
2008-01-11 17:26                         ` Jens Axboe
2008-01-11  9:29           ` CONFIG_NO_HZ breaks " Ingo Molnar
2008-01-11  9:29             ` Ingo Molnar
2008-01-11  9:34             ` Jens Axboe
2008-01-11  9:34               ` Jens Axboe
2008-01-11  9:28         ` nigel
2008-01-11  9:28           ` nigel
2008-01-11  9:32           ` Ingo Molnar
2008-01-11  9:32             ` Ingo Molnar
2008-01-13 22:54             ` nigel
2008-01-13 22:54               ` nigel
2008-01-11  9:44       ` Ingo Molnar
2008-01-11  9:44         ` Ingo Molnar
2008-01-11  9:51         ` Ingo Molnar
2008-01-11  9:51           ` Ingo Molnar
2008-01-11 10:41     ` Guillaume Chazarain
2008-01-11 10:41       ` Guillaume Chazarain
2008-01-11 10:55       ` Ingo Molnar [this message]
2008-01-11 10:55         ` Ingo Molnar
2008-01-11 22:30         ` Guillaume Chazarain
2008-01-11 22:30           ` Guillaume Chazarain
2008-01-12  0:03           ` Guillaume Chazarain
2008-01-12  0:03             ` Guillaume Chazarain
2008-01-11  9:34   ` Ingo Molnar
2008-01-11  9:34     ` Ingo Molnar
2008-01-11 15:43     ` David Dillow
2008-01-11 15:43       ` David Dillow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080111105534.GC1589@elte.hu \
    --to=mingo@elte.hu \
    --cc=dillowda@ornl.gov \
    --cc=guichaz@yahoo.fr \
    --cc=jens.axboe@oracle.com \
    --cc=linux-btrace@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nigel@suspend2.net \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.