linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: john stultz <johnstul@us.ibm.com>
To: Jon Hunter <jon-hunter@ti.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
Date: Tue, 12 May 2009 16:58:47 -0700	[thread overview]
Message-ID: <1242172727.3462.55.camel@localhost> (raw)
In-Reply-To: <4A0A07D6.90408@ti.com>

On Tue, 2009-05-12 at 18:35 -0500, Jon Hunter wrote:
> john stultz wrote:
> > Yea. NSEC_PER_SEC/HZ would probably be safe. I was initially thinking
> > being more paranoid and just dividing it in half, but that's probably a
> > bit silly.
> 
> Thanks, I have added the code to subtract NSEC_PER_SEC/HZ. Should we 
> have any concerns about the adjustment of the mult value? This is the 
> only thing that could impact the value returned from 
> timekeeping_max_deferment(). I am not familiar with exactly how this is 
> working so just wanted to ask.

Well, the mult adjustments should be quite small, especially compared to
the NSEC_PER_SEC/HZ adjustment.

Hmm... Although, I guess we could get bitten if the max_deferment was
like an hour, and the adjustment was enough that it scaled out to and we
ended up being a second late or so. So you have a point.

But since the clockevent driver is not scaled, we probably can get away
with using the orig_mult value instead of mult, and be ok.

Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the
larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale
up without a problem. 

I suspect it would be tough to hit this issue though.

> > As far the decision to defer if the next even is greater then one jiffy
> > away, that seems reasonable, but I'd not embed that into the
> > timekeeping_max_deferrment(). 
> > 
> > I'm suggesting we drop timekeeping_max_deferrment() down since that's
> > the absolute maximum and we're sure to break if we actually wait that
> > long (since the time between clocksource reads would certainly be longer
> > due to execution delay). 1HZ seems reasonable, since we should easily be
> > able to run the tick code twice in that time, as well as it should be
> > easily within the interrupt programming granularity.
> > 
> > Any additional decisions as to how far out we should be before we start
> > skipping ticks would be up to the tick resched code, and shouldn't be in
> > the timekeeping function.
> > 
> > Sound sane? If so add that in and I'll ack it.
> 
> Yes, agree. See below. By the way I have kept the below patch separate 
> from the original I posted here:
> 
> http://marc.info/?l=linux-kernel&m=124026224019895&w=2
> 
> I was not sure if you would prefer to keep these as two patch series or 
> make it one single patch. Let me know if you would like me to combine or 
> re-post as a two patch series.

Two patches should be fine.

> Please note that the environment I have been running some basic tests on 
> is a single core ARM device. I just wanted to let you know in case you 
> have any concerns with this.
> 
> > This looks *much* better to me. Thanks for reworking it!
> 
> Great! No problem. Thanks for your help and feedback.
> 
> Cheers
> Jon
> 
> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

Looks good overall. We may want to add the -10% (or -5%) to be totally
safe, but that's likely just me being paranoid.

Also one more safety issue below.

Otherwise,
Acked-by: John Stultz <johnstul@us.ibm.com>


thanks
-john


> ---
>   include/linux/time.h      |    1 +
>   kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++++-----------
>   kernel/time/timekeeping.c |   19 +++++++++++++++++++
>   3 files changed, 45 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
> 
>   extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>   extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>   extern void update_wall_time(void);
>   extern void update_xtime_cache(u64 nsec);
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..f0155ae 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	ktime_t last_update, expires, now;
>   	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>   	int cpu;
> +	s64 time_delta, max_time_delta;
> 
>   	local_irq_save(flags);
> 
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		seq = read_seqbegin(&xtime_lock);
>   		last_update = last_jiffies_update;
>   		last_jiffies = jiffies;
> +		max_time_delta = timekeeping_max_deferment();
>   	} while (read_seqretry(&xtime_lock, seq));
> 
>   	/* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	if ((long)delta_jiffies >= 1) {
> 
>   		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 */
> +		time_delta = tick_period.tv64 * delta_jiffies;
> +
> +		if (time_delta > max_time_delta)
> +			time_delta = max_time_delta;
> +
> +		/*
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer
> +		 */
> +		expires = ktime_add_ns(last_update, time_delta);
> 
>   		/*
>   		 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		if (cpu == tick_do_timer_cpu)
>   			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> 
> -		if (delta_jiffies > 1)
> +		if (time_delta > tick_period.tv64)
>   			cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
>   		/* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		ts->idle_sleeps++;
> 
>   		/*
> -		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -		 * there is no timer pending or at least extremly far
> -		 * into the future (12 days for HZ=1000). In this case
> -		 * we simply stop the tick timer:
> +		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +		 * signals that there is no timer pending or at least
> +		 * extremely far into the future (12 days for HZ=1000).
> +		 * In this case we simply stop the tick timer:
>   		 */
> -		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +		if (unlikely(time_delta >=
> +				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
>   			ts->idle_expires.tv64 = KTIME_MAX;
>   			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>   				hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..7617fbe 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,25 @@ int timekeeping_valid_for_hres(void)
>   }
> 
>   /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be 
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	s64 max_nsecs;
> +
> +	/*
> +	 * Limit the time the clocksource can be
> +	 * deferred by one jiffie period to ensure
> +	 * that the clocksource will not wrap.
> +	 */
> +	max_nsecs = cyc2ns(clock, clock->mask) - (NSEC_PER_SEC/HZ);
> +

This seems really unlikely, but you might want to add something like:

	if (max_nsecs < 0)
		max_nsecs = 0;

To avoid negative underflows. I don't see how a system could be running
in highres mode if the clocksource isn't continuous for longer then a
tick, but probably a good idea none the less.


> +	return max_nsecs;
> +}
> +
> +/**
>    * read_persistent_clock -  Return time in seconds from the persistent 
> clock.
>    *
>    * Weak dummy function for arches that do not yet support it.


  reply	other threads:[~2009-05-12 23:58 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-20 21:16 [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
2009-04-21  6:35 ` Ingo Molnar
2009-04-21 20:32   ` john stultz
2009-04-21 23:20     ` Jon Hunter
2009-04-22  0:02       ` john stultz
2009-05-07 14:52         ` Jon Hunter
2009-05-08  0:54           ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formore " john stultz
2009-05-08 16:05             ` Jon Hunter
2009-05-09  0:51               ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan " john stultz
2009-05-12 23:35                 ` Jon Hunter
2009-05-12 23:58                   ` john stultz [this message]
2009-05-13 15:14                     ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds Jon Hunter
2009-05-13 16:41                       ` John Stultz
2009-05-13 17:54                         ` Jon Hunter
2009-05-13 19:21                           ` John Stultz
2009-05-15 16:35                             ` Jon Hunter
2009-05-15 18:55                               ` Jon Hunter
2009-05-16  1:29                                 ` John Stultz
2009-05-16  1:18                               ` John Stultz
2009-05-22 18:21                                 ` Jon Hunter
2009-05-22 19:23                                   ` john stultz
2009-05-22 19:54                                     ` Thomas Gleixner
2009-05-26 15:12                                       ` Jon Hunter
2009-05-26 20:26                                         ` john stultz
2009-05-22 19:59                                   ` Thomas Gleixner
2009-04-22  0:05       ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds john stultz
2009-04-22  3:07         ` Jon Hunter
2009-04-22 15:30           ` Chris Friesen
2009-04-22 17:04             ` Jon Hunter
2009-04-22 18:53               ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1242172727.3462.55.camel@localhost \
    --to=johnstul@us.ibm.com \
    --cc=jon-hunter@ti.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).