public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexander Shishkin <virtuoso@slind.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	John Stultz <johnstul@us.ibm.com>,
	Chris Friesen <chris.friesen@genband.com>,
	Kay Sievers <kay.sievers@vrfy.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Davide Libenzi <davidel@xmailserver.org>
Subject: Re: [RFC][PATCH 1/4] clock_rtoffset: new syscall
Date: Thu, 28 Apr 2011 10:15:30 +0300	[thread overview]
Message-ID: <8762pyonhp.fsf@shisha.kicks-ass.net> (raw)
In-Reply-To: <alpine.LFD.2.02.1104271359580.3323@ionos>

On Wed, 27 Apr 2011 16:02:33 +0200 (CEST), Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 27 Apr 2011, Alexander Shishkin wrote:
> 
> > In order to keep track of system time changes, we introduce a new
> > syscall which returns the offset of CLOCK_MONOTONIC clock against
> > CLOCK_REALTIME. The caller is to store this value and use it in
> > system calls (like clock_nanosleep or timerfd_settime) that will
> > compare it against the effective offset in order to ensure that
> > the caller's notion of the system time corresponds to the effective
> > system time at the moment of the action being carried out. If it
> > has changed, these system calls will return an error and the caller
> > will have to obtain this offset again.
> 
> No, we do not expose kernel internals like this to user space. The
> kernel internal representation of time is subject to change.
> 
> Also abusing the reminder argument of clock_nanosleep for handing back
> the offset is a horrible hack including the non standard -ECANCELED
> return value. No, we don't change posix interfaces that way.

Hm, https://lkml.org/lkml/2011/3/10/471

> We can add something to timerfd, but definitly not with another
> syscall and by bloating hrtimer and sprinkling cancellation calls all
> over the place. clock_was_set() should be enough and it already calls
> into the hrtimer code, the resume path calls into it as well, so
> there is no need to introduce such a mess.

Agreed.

> The completely untested patch below should solve the same problem in a
> sane way. Restricted to timerfd, but that really should be sufficient.

Looks useful to me. FWIW,

Reviewed-by: Alexander Shishkin <virtuoso@slind.org>

Regards,
--
Alex

> 
> Thanks,
> 
> 	tglx
> 
> ------------->
> 
> Subject: timerfd: Allow timers to be cancelled when clock was set
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 27 Apr 2011 14:16:42 +0200
> 
> <Add proper changelog here>
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  fs/timerfd.c              |   57 +++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/hrtimer.h   |    3 +-
>  include/linux/time.h      |    5 ++++
>  include/linux/timerfd.h   |    3 +-
>  kernel/hrtimer.c          |   41 +++++++++++++++++++++++++++------
>  kernel/time/timekeeping.c |   15 ++++++++++++
>  6 files changed, 109 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6-tip/fs/timerfd.c
> ===================================================================
> --- linux-2.6-tip.orig/fs/timerfd.c
> +++ linux-2.6-tip/fs/timerfd.c
> @@ -26,10 +26,12 @@
>  struct timerfd_ctx {
>  	struct hrtimer tmr;
>  	ktime_t tintv;
> +	ktime_t moffs;
>  	wait_queue_head_t wqh;
>  	u64 ticks;
>  	int expired;
>  	int clockid;
> +	bool might_cancel;
>  };
>  
>  /*
> @@ -59,24 +61,52 @@ static ktime_t timerfd_get_remaining(str
>  	return remaining.tv64 < 0 ? ktime_set(0, 0): remaining;
>  }
>  
> -static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
> -			  const struct itimerspec *ktmr)
> +static bool timerfd_cancelled(struct timerfd_ctx *ctx)
> +{
> +	ktime_t moffs;
> +
> +	if (!ctx->might_cancel)
> +		return false;
> +
> +	moffs = get_monotonic_offset();
> +
> +	if (moffs.tv64 == ctx->moffs.tv64)
> +		return false;
> +
> +	ctx->moffs = moffs;
> +	return true;
> +}
> +
> +static int timerfd_setup(struct timerfd_ctx *ctx, int flags,
> +			 const struct itimerspec *ktmr)
>  {
>  	enum hrtimer_mode htmode;
>  	ktime_t texp;
> +	int clockid = ctx->clockid;
>  
>  	htmode = (flags & TFD_TIMER_ABSTIME) ?
>  		HRTIMER_MODE_ABS: HRTIMER_MODE_REL;
>  
> +	ctx->might_cancel = false;
> +	if (htmode == HRTIMER_MODE_ABS && ctx->clockid == CLOCK_REALTIME &&
> +	    (flags & TFD_TIMER_CANCELON_SET)) {
> +		clockid = CLOCK_REALTIME_COS;
> +		ctx->might_cancel = true;
> +	}
> +
>  	texp = timespec_to_ktime(ktmr->it_value);
>  	ctx->expired = 0;
>  	ctx->ticks = 0;
>  	ctx->tintv = timespec_to_ktime(ktmr->it_interval);
> -	hrtimer_init(&ctx->tmr, ctx->clockid, htmode);
> +	hrtimer_init(&ctx->tmr, clockid, htmode);
>  	hrtimer_set_expires(&ctx->tmr, texp);
>  	ctx->tmr.function = timerfd_tmrproc;
> -	if (texp.tv64 != 0)
> +	if (texp.tv64 != 0) {
>  		hrtimer_start(&ctx->tmr, texp, htmode);
> +		if (timerfd_cancelled(ctx))
> +			return -ECANCELED;
> +	}
> +	return 0;
>  }
>  
>  static int timerfd_release(struct inode *inode, struct file *file)
> @@ -118,8 +148,21 @@ static ssize_t timerfd_read(struct file 
>  		res = -EAGAIN;
>  	else
>  		res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
> +
>  	if (ctx->ticks) {
>  		ticks = ctx->ticks;
> +
> +		/*
> +		 * If clock has changed, we do not care about the
> +		 * ticks and we do not rearm the timer. Userspace must
> +		 * reevaluate anyway.
> +		 */
> +		if (timerfd_cancelled(ctx)) {
> +			ticks = 0;
> +			ctx->expired = 0;
> +			res = -ECANCELED;
> +		}
> +
>  		if (ctx->expired && ctx->tintv.tv64) {
>  			/*
>  			 * If tintv.tv64 != 0, this is a periodic timer that
> @@ -183,6 +226,7 @@ SYSCALL_DEFINE2(timerfd_create, int, clo
>  	init_waitqueue_head(&ctx->wqh);
>  	ctx->clockid = clockid;
>  	hrtimer_init(&ctx->tmr, clockid, HRTIMER_MODE_ABS);
> +	ctx->moffs = get_monotonic_offset();
>  
>  	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
>  			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
> @@ -199,6 +243,7 @@ SYSCALL_DEFINE4(timerfd_settime, int, uf
>  	struct file *file;
>  	struct timerfd_ctx *ctx;
>  	struct itimerspec ktmr, kotmr;
> +	int ret;
>  
>  	if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
>  		return -EFAULT;
> @@ -240,14 +285,14 @@ SYSCALL_DEFINE4(timerfd_settime, int, uf
>  	/*
>  	 * Re-program the timer to the new value ...
>  	 */
> -	timerfd_setup(ctx, flags, &ktmr);
> +	ret = timerfd_setup(ctx, flags, &ktmr);
>  
>  	spin_unlock_irq(&ctx->wqh.lock);
>  	fput(file);
>  	if (otmr && copy_to_user(otmr, &kotmr, sizeof(kotmr)))
>  		return -EFAULT;
>  
> -	return 0;
> +	return ret;
>  }
>  
>  SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
> Index: linux-2.6-tip/include/linux/hrtimer.h
> ===================================================================
> --- linux-2.6-tip.orig/include/linux/hrtimer.h
> +++ linux-2.6-tip/include/linux/hrtimer.h
> @@ -157,6 +157,7 @@ enum  hrtimer_base_type {
>  	HRTIMER_BASE_REALTIME,
>  	HRTIMER_BASE_MONOTONIC,
>  	HRTIMER_BASE_BOOTTIME,
> +	HRTIMER_BASE_REALTIME_COS,
>  	HRTIMER_MAX_CLOCK_BASES,
>  };
>  
> @@ -319,7 +320,7 @@ static inline int hrtimer_is_hres_active
>  extern ktime_t ktime_get(void);
>  extern ktime_t ktime_get_real(void);
>  extern ktime_t ktime_get_boottime(void);
> -
> +ktime_t get_monotonic_offset(void);
>  
>  DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
>  
> Index: linux-2.6-tip/include/linux/time.h
> ===================================================================
> --- linux-2.6-tip.orig/include/linux/time.h
> +++ linux-2.6-tip/include/linux/time.h
> @@ -295,6 +295,11 @@ struct itimerval {
>  #define CLOCK_MONOTONIC_COARSE		6
>  #define CLOCK_BOOTTIME			7
>  
> +#ifdef __KERNEL__
> +/* This clock is not exposed to user space */
> +#define CLOCK_REALTIME_COS		8
> +#endif
> +
>  /*
>   * The IDs of various hardware clocks:
>   */
> Index: linux-2.6-tip/include/linux/timerfd.h
> ===================================================================
> --- linux-2.6-tip.orig/include/linux/timerfd.h
> +++ linux-2.6-tip/include/linux/timerfd.h
> @@ -19,6 +19,7 @@
>   * shared O_* flags.
>   */
>  #define TFD_TIMER_ABSTIME (1 << 0)
> +#define TFD_TIMER_CANCELON_SET (1 << 1)
>  #define TFD_CLOEXEC O_CLOEXEC
>  #define TFD_NONBLOCK O_NONBLOCK
>  
> @@ -26,6 +27,6 @@
>  /* Flags for timerfd_create.  */
>  #define TFD_CREATE_FLAGS TFD_SHARED_FCNTL_FLAGS
>  /* Flags for timerfd_settime.  */
> -#define TFD_SETTIME_FLAGS TFD_TIMER_ABSTIME
> +#define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_TIMER_CANCELON_SET)
>  
>  #endif /* _LINUX_TIMERFD_H */
> Index: linux-2.6-tip/kernel/hrtimer.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/hrtimer.c
> +++ linux-2.6-tip/kernel/hrtimer.c
> @@ -78,6 +78,11 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, 
>  			.get_time = &ktime_get_boottime,
>  			.resolution = KTIME_LOW_RES,
>  		},
> +		{
> +			.index = CLOCK_REALTIME_COS,
> +			.get_time = &ktime_get_real,
> +			.resolution = KTIME_LOW_RES,
> +		},
>  	}
>  };
>  
> @@ -106,6 +111,7 @@ static void hrtimer_get_softirq_time(str
>  	base->clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim;
>  	base->clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono;
>  	base->clock_base[HRTIMER_BASE_BOOTTIME].softirq_time = boot;
> +	base->clock_base[HRTIMER_BASE_REALTIME_COS].softirq_time = xtim;
>  }
>  
>  /*
> @@ -617,6 +623,8 @@ static int hrtimer_reprogram(struct hrti
>  	return res;
>  }
>  
> +static void
> +hrtimer_expire_cancelable(struct hrtimer_cpu_base *cpu_base, ktime_t now);
>  
>  /*
>   * Retrigger next event is called after clock was set
> @@ -626,13 +634,12 @@ static int hrtimer_reprogram(struct hrti
>  static void retrigger_next_event(void *arg)
>  {
>  	struct hrtimer_cpu_base *base;
> -	struct timespec realtime_offset, wtm, sleep;
> +	struct timespec realtime_offset, xtim, wtm, sleep;
>  
>  	if (!hrtimer_hres_active())
>  		return;
>  
> -	get_xtime_and_monotonic_and_sleep_offset(&realtime_offset, &wtm,
> -							&sleep);
> +	get_xtime_and_monotonic_and_sleep_offset(&xtim, &wtm, &sleep);
>  	set_normalized_timespec(&realtime_offset, -wtm.tv_sec, -wtm.tv_nsec);
>  
>  	base = &__get_cpu_var(hrtimer_bases);
> @@ -643,6 +650,10 @@ static void retrigger_next_event(void *a
>  		timespec_to_ktime(realtime_offset);
>  	base->clock_base[HRTIMER_BASE_BOOTTIME].offset =
>  		timespec_to_ktime(sleep);
> +	base->clock_base[HRTIMER_BASE_REALTIME_COS].offset =
> +		timespec_to_ktime(realtime_offset);
> +
> +	hrtimer_expire_cancelable(base, timespec_to_ktime(xtim));
>  
>  	hrtimer_force_reprogram(base, 0);
>  	raw_spin_unlock(&base->lock);
> @@ -715,7 +726,7 @@ static inline int hrtimer_enqueue_reprog
>   */
>  static int hrtimer_switch_to_hres(void)
>  {
> -	int cpu = smp_processor_id();
> +	int i, cpu = smp_processor_id();
>  	struct hrtimer_cpu_base *base = &per_cpu(hrtimer_bases, cpu);
>  	unsigned long flags;
>  
> @@ -731,9 +742,8 @@ static int hrtimer_switch_to_hres(void)
>  		return 0;
>  	}
>  	base->hres_active = 1;
> -	base->clock_base[HRTIMER_BASE_REALTIME].resolution = KTIME_HIGH_RES;
> -	base->clock_base[HRTIMER_BASE_MONOTONIC].resolution = KTIME_HIGH_RES;
> -	base->clock_base[HRTIMER_BASE_BOOTTIME].resolution = KTIME_HIGH_RES;
> +	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++)
> +		base->clock_base[i].resolution = KTIME_HIGH_RES;
>  
>  	tick_setup_sched_timer();
>  
> @@ -1221,6 +1231,22 @@ static void __run_hrtimer(struct hrtimer
>  	timer->state &= ~HRTIMER_STATE_CALLBACK;
>  }
>  
> +static void
> +hrtimer_expire_cancelable(struct hrtimer_cpu_base *cpu_base, ktime_t now)
> +{
> +	struct timerqueue_node *node;
> +	struct hrtimer_clock_base *base;
> +
> +	base = cpu_base->clock_base + HRTIMER_BASE_REALTIME_COS;
> +
> +	while ((node = timerqueue_getnext(&base->active))) {
> +			struct hrtimer *timer;
> +
> +			timer = container_of(node, struct hrtimer, node);
> +			__run_hrtimer(timer, &now);
> +	}
> +}
> +
>  #ifdef CONFIG_HIGH_RES_TIMERS
>  
>  /*
> @@ -1725,6 +1751,7 @@ void __init hrtimers_init(void)
>  	hrtimer_clock_to_base_table[CLOCK_REALTIME] = HRTIMER_BASE_REALTIME;
>  	hrtimer_clock_to_base_table[CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC;
>  	hrtimer_clock_to_base_table[CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME;
> +	hrtimer_clock_to_base_table[CLOCK_REALTIME_COS] = HRTIMER_BASE_REALTIME_COS;
>  
>  	hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
>  			  (void *)(long)smp_processor_id());
> Index: linux-2.6-tip/kernel/time/timekeeping.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/time/timekeeping.c
> +++ linux-2.6-tip/kernel/time/timekeeping.c
> @@ -1049,6 +1049,21 @@ void get_xtime_and_monotonic_and_sleep_o
>  }
>  
>  /**
> + * get_monotonic_offset() - get wall_to_monotonic
> + */
> +ktime_t get_monotonic_offset(void)
> +{
> +	unsigned long seq;
> +	struct timespec wtom;
> +
> +	do {
> +		seq = read_seqbegin(&xtime_lock);
> +		wtom = wall_to_monotonic;
> +	} while (read_seqretry(&xtime_lock, seq));
> +	return timespec_to_ktime(wtom);
> +}
> +
> +/**
>   * xtime_update() - advances the timekeeping infrastructure
>   * @ticks:	number of ticks, that have elapsed since the last call.
>   *

      parent reply	other threads:[~2011-04-28  7:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-09 14:36 [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes Alexander Shishkin
2011-03-10  0:25 ` Andrew Morton
2011-03-10  0:36   ` Kay Sievers
2011-03-10  8:19     ` Alexander Shishkin
2011-03-10  9:08     ` Thomas Gleixner
2011-03-10 11:16       ` Jamie Lokier
2011-03-10 11:41         ` Thomas Gleixner
2011-03-10  2:01   ` Scott James Remnant
2011-03-10  8:25     ` Andrew Morton
2011-03-11 19:51       ` Scott James Remnant
2011-03-11 19:56         ` Thomas Gleixner
2011-03-15  1:53           ` Scott James Remnant
2011-03-10  8:10   ` Alexander Shishkin
2011-03-10  8:02 ` Kirill A. Shutemov
2011-03-10  8:15   ` Alexander Shishkin
2011-03-10  8:48 ` Arnd Bergmann
2011-03-10 14:19   ` Alexander Shishkin
2011-03-10  9:52 ` Thomas Gleixner
2011-03-10 14:12   ` Alexander Shishkin
2011-03-10 14:55     ` Thomas Gleixner
2011-03-10 15:43       ` Alexander Shishkin
2011-03-10 16:40         ` Thomas Gleixner
2011-03-10 21:57     ` Thomas Gleixner
2011-04-27 10:43       ` [RFC][PATCH 1/4] clock_rtoffset: new syscall Alexander Shishkin
2011-04-27 10:43         ` [RFC][PATCH 2/4] hrtimer: add cancellation when clock is set Alexander Shishkin
2011-04-27 10:43         ` [RFC][PATCH 3/4] hrtimer: add nanosleep cancellation Alexander Shishkin
2011-04-27 10:43         ` [RFC][PATCH 4/4] timerfd: add cancellation Alexander Shishkin
2011-04-27 14:02         ` [RFC][PATCH 1/4] clock_rtoffset: new syscall Thomas Gleixner
2011-04-27 19:11           ` john stultz
2011-04-27 22:19             ` Thomas Gleixner
2011-04-27 20:55           ` Kay Sievers
2011-04-29 17:32             ` Thomas Gleixner
2011-05-02  8:10               ` Alexander Shishkin
2011-04-28  7:15           ` Alexander Shishkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8762pyonhp.fsf@shisha.kicks-ass.net \
    --to=virtuoso@slind.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris.friesen@genband.com \
    --cc=davidel@xmailserver.org \
    --cc=johnstul@us.ibm.com \
    --cc=kay.sievers@vrfy.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox