linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
@ 2011-03-09 14:36 Alexander Shishkin
       [not found] ` <1299681411-9227-1-git-send-email-virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org>
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Alexander Shishkin @ 2011-03-09 14:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Shishkin, Ken MacLeod, Shaun Reich, Thomas Gleixner,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Andrew Morton,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kay Sievers, Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel

Changes since v3:
 - changed timerfd_settime() semantics (see below)
Changes since v2:
 - replaced sysfs interface with a syscall
 - added sysctl/procfs handle to set a limit to the number of users
 - fixed issues pointed out by Greg.
Changes since v1:
 - updated against 2.6.36-rc1,
 - added notification/filtering options,
 - added Documentation/ABI/sysfs-kernel-time-notify interface description.

Certain userspace applications (like "clock" desktop applets or cron or
systemd) might want to be notified when some other application changes
the system time. There are several known to me reasons for this:
 - avoiding periodic wakeups to poll time changes;
 - rearming CLOCK_REALTIME timers when said changes happen;
 - changing system timekeeping policy for system-wide time management
   programs;
 - keeping guest applications/operating systems running in emulators
   up to date;
 - recalculation of traffic signal cycles in Advanced Traffic Controllers
   (ATC), which is part of ATC API requirements [1] as developed by ATC
   working group of the U.S. Institute of Transportation Engineers (ITE).

The major change since the previous version is the new semantics of
timerfd_settime() when it's called on a time change notification
descriptor: it will set the system time to utmr.it_value if the time
change counter is zero, otherwise it will return EBUSY, this is required
to prevent a race between setting the time and reading the counter, when
the time controlling procees changes the time immediately after another
process in the system did the same (the counter is greater than one),
that process' time change will be lost. Thus, the time controlling
process should use timerfd_settime() instead of clock_settime() or
settimeofday() to ensure that other processes' time changes don't get
lost.

This is another attempt to approach notifying userspace about system
clock changes. The other one is using an eventfd and a syscall [1]. In
the course of discussing the necessity of a syscall for this kind of
notifications, it was suggested that this functionality can be achieved
via timers [2] (and timerfd in particular [3]). This idea got quite
some support [4], [5], [6] and some vague criticism [7], so I decided
to try and go a bit further with it.

[1] http://www.ite.org/standards/atcapi/version2.asp
[2] http://marc.info/?l=linux-kernel&m=128950389423614&w=2
[3] http://marc.info/?l=linux-kernel&m=128951020831573&w=2
[4] http://marc.info/?l=linux-kernel&m=128951588006157&w=2
[5] http://marc.info/?l=linux-kernel&m=128951503205111&w=2
[6] http://marc.info/?l=linux-kernel&m=128955890118477&w=2
[7] http://marc.info/?l=linux-kernel&m=129002967031104&w=2
[8] http://marc.info/?l=linux-kernel&m=129002672227263&w=2

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Feng Tang <feng.tang@intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: John Stultz <johnstul@us.ibm.com>
CC: Chris Friesen <chris.friesen@genband.com>
CC: Kay Sievers <kay.sievers@vrfy.org>
CC: Kirill A. Shutemov <kirill@shutemov.name>
CC: Artem Bityutskiy <dedekind1@gmail.com>
CC: Davide Libenzi <davidel@xmailserver.org>
CC: linux-fsdevel@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
 fs/timerfd.c            |   94 ++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/hrtimer.h |    6 +++
 include/linux/timerfd.h |    3 +-
 kernel/compat.c         |    5 ++-
 kernel/hrtimer.c        |    4 ++
 kernel/time.c           |   11 ++++-
 6 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 8c4fc14..6170f61 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -22,6 +22,7 @@
 #include <linux/anon_inodes.h>
 #include <linux/timerfd.h>
 #include <linux/syscalls.h>
+#include <linux/security.h>
 
 struct timerfd_ctx {
 	struct hrtimer tmr;
@@ -30,8 +31,13 @@ struct timerfd_ctx {
 	u64 ticks;
 	int expired;
 	int clockid;
+	struct list_head notifiers_list;
 };
 
+/* TFD_NOTIFY_CLOCK_SET timers go here */
+static DEFINE_SPINLOCK(notifiers_lock);
+static LIST_HEAD(notifiers_list);
+
 /*
  * This gets called when the timer event triggers. We set the "expired"
  * flag, but we do not re-arm the timer (in case it's necessary,
@@ -51,10 +57,31 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
 	return HRTIMER_NORESTART;
 }
 
+void timerfd_clock_was_set(clockid_t clockid)
+{
+	struct timerfd_ctx *ctx;
+	unsigned long flags;
+
+	spin_lock(&notifiers_lock);
+	list_for_each_entry(ctx, &notifiers_list, notifiers_list) {
+		spin_lock_irqsave(&ctx->wqh.lock, flags);
+		if (ctx->tmr.base->index == clockid) {
+			ctx->ticks++;
+			wake_up_locked(&ctx->wqh);
+		}
+		spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+	}
+	spin_unlock(&notifiers_lock);
+}
+
 static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx)
 {
 	ktime_t remaining;
 
+	/* for notification timers, return current time */
+	if (!list_empty(&ctx->notifiers_list))
+		return timespec_to_ktime(current_kernel_time());
+
 	remaining = hrtimer_expires_remaining(&ctx->tmr);
 	return remaining.tv64 < 0 ? ktime_set(0, 0): remaining;
 }
@@ -72,6 +99,12 @@ static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
 	ctx->expired = 0;
 	ctx->ticks = 0;
 	ctx->tintv = timespec_to_ktime(ktmr->it_interval);
+
+	if (flags & TFD_NOTIFY_CLOCK_SET) {
+		list_add(&ctx->notifiers_list, &notifiers_list);
+		return;
+	}
+
 	hrtimer_init(&ctx->tmr, ctx->clockid, htmode);
 	hrtimer_set_expires(&ctx->tmr, texp);
 	ctx->tmr.function = timerfd_tmrproc;
@@ -83,7 +116,12 @@ static int timerfd_release(struct inode *inode, struct file *file)
 {
 	struct timerfd_ctx *ctx = file->private_data;
 
-	hrtimer_cancel(&ctx->tmr);
+	if (!list_empty(&ctx->notifiers_list)) {
+		spin_lock(&notifiers_lock);
+		list_del(&ctx->notifiers_list);
+		spin_unlock(&notifiers_lock);
+	} else
+		hrtimer_cancel(&ctx->tmr);
 	kfree(ctx);
 	return 0;
 }
@@ -113,6 +151,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
 
 	if (count < sizeof(ticks))
 		return -EINVAL;
+
 	spin_lock_irq(&ctx->wqh.lock);
 	if (file->f_flags & O_NONBLOCK)
 		res = -EAGAIN;
@@ -120,7 +159,8 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
 		res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
 	if (ctx->ticks) {
 		ticks = ctx->ticks;
-		if (ctx->expired && ctx->tintv.tv64) {
+		if (ctx->expired && ctx->tintv.tv64 &&
+		    list_empty(&ctx->notifiers_list)) {
 			/*
 			 * If tintv.tv64 != 0, this is a periodic timer that
 			 * needs to be re-armed. We avoid doing it in the timer
@@ -184,6 +224,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 	ctx->clockid = clockid;
 	hrtimer_init(&ctx->tmr, clockid, HRTIMER_MODE_ABS);
 
+	INIT_LIST_HEAD(&ctx->notifiers_list);
+
 	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
 			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
 	if (ufd < 0)
@@ -196,18 +238,24 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
 		const struct itimerspec __user *, utmr,
 		struct itimerspec __user *, otmr)
 {
+	int ret = 0;
 	struct file *file;
 	struct timerfd_ctx *ctx;
 	struct itimerspec ktmr, kotmr;
 
-	if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
-		return -EFAULT;
-
-	if ((flags & ~TFD_SETTIME_FLAGS) ||
-	    !timespec_valid(&ktmr.it_value) ||
-	    !timespec_valid(&ktmr.it_interval))
+	if (flags & ~TFD_SETTIME_FLAGS)
 		return -EINVAL;
 
+	/* utmr may be NULL for notification timerfd */
+	if (!(flags & TFD_NOTIFY_CLOCK_SET) || utmr) {
+		if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
+			return -EFAULT;
+
+		if (!timespec_valid(&ktmr.it_value) ||
+		    !timespec_valid(&ktmr.it_interval))
+			return -EINVAL;
+	}
+
 	file = timerfd_fget(ufd);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
@@ -218,10 +266,12 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
 	 * it to the new values.
 	 */
 	for (;;) {
+		spin_lock(&notifiers_lock);
 		spin_lock_irq(&ctx->wqh.lock);
-		if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
+		if (!list_empty(&notifiers_list) || hrtimer_try_to_cancel(&ctx->tmr) >= 0)
 			break;
 		spin_unlock_irq(&ctx->wqh.lock);
+		spin_unlock(&notifiers_lock);
 		cpu_relax();
 	}
 
@@ -238,16 +288,39 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
 	kotmr.it_interval = ktime_to_timespec(ctx->tintv);
 
 	/*
+	 * for the notification timerfd, set current time to it_value
+	 * if the timer hasn't expired; otherwise someone has changed
+	 * the system time to the value that we don't know
+	 */
+	if (!list_empty(&ctx->notifiers_list) && utmr) {
+		if (ctx->ticks) {
+			ret = -EBUSY;
+			goto out;
+		}
+
+		ret = security_settime(&ktmr.it_value, NULL);
+		if (ret)
+			goto out;
+
+		spin_unlock_irq(&ctx->wqh.lock);
+		ret = do_settimeofday(&ktmr.it_value);
+		goto out1;
+	}
+
+	/*
 	 * Re-program the timer to the new value ...
 	 */
 	timerfd_setup(ctx, flags, &ktmr);
 
+out:
 	spin_unlock_irq(&ctx->wqh.lock);
+out1:
+	spin_unlock(&notifiers_lock);
 	fput(file);
 	if (otmr && copy_to_user(otmr, &kotmr, sizeof(kotmr)))
 		return -EFAULT;
 
-	return 0;
+	return ret;
 }
 
 SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
@@ -273,6 +346,7 @@ SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
 	spin_unlock_irq(&ctx->wqh.lock);
 	fput(file);
 
+out:
 	return copy_to_user(otmr, &kotmr, sizeof(kotmr)) ? -EFAULT: 0;
 }
 
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 6bc1804..991e8d9 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -248,6 +248,12 @@ static inline ktime_t hrtimer_expires_remaining(const struct hrtimer *timer)
 	return ktime_sub(timer->node.expires, timer->base->get_time());
 }
 
+#ifdef CONFIG_TIMERFD
+extern void timerfd_clock_was_set(clockid_t clockid);
+#else
+static inline void timerfd_clock_was_set(clockid_t clockid) {}
+#endif
+
 #ifdef CONFIG_HIGH_RES_TIMERS
 struct clock_event_device;
 
diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h
index 2d07929..c3ddad9 100644
--- a/include/linux/timerfd.h
+++ b/include/linux/timerfd.h
@@ -19,6 +19,7 @@
  * shared O_* flags.
  */
 #define TFD_TIMER_ABSTIME (1 << 0)
+#define TFD_NOTIFY_CLOCK_SET (1 << 1)
 #define TFD_CLOEXEC O_CLOEXEC
 #define TFD_NONBLOCK O_NONBLOCK
 
@@ -26,6 +27,6 @@
 /* Flags for timerfd_create.  */
 #define TFD_CREATE_FLAGS TFD_SHARED_FCNTL_FLAGS
 /* Flags for timerfd_settime.  */
-#define TFD_SETTIME_FLAGS TFD_TIMER_ABSTIME
+#define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_NOTIFY_CLOCK_SET)
 
 #endif /* _LINUX_TIMERFD_H */
diff --git a/kernel/compat.c b/kernel/compat.c
index 38b1d2c..b1cf3e1 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -995,7 +995,10 @@ asmlinkage long compat_sys_stime(compat_time_t __user *tptr)
 	if (err)
 		return err;
 
-	do_settimeofday(&tv);
+	err = do_settimeofday(&tv);
+	if (!err)
+		timerfd_clock_was_set(CLOCK_REALTIME);
+
 	return 0;
 }
 
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 2c3d6e5..469eef6 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -663,6 +663,7 @@ void clock_was_set(void)
 {
 	/* Retrigger the CPU local events everywhere */
 	on_each_cpu(retrigger_next_event, NULL, 1);
+
 }
 
 /*
@@ -675,6 +676,9 @@ void hres_timers_resume(void)
 		  KERN_INFO "hres_timers_resume() called with IRQs enabled!");
 
 	retrigger_next_event(NULL);
+
+	/* Trigger timerfd notifiers */
+	timerfd_clock_was_set(CLOCK_MONOTONIC);
 }
 
 /*
diff --git a/kernel/time.c b/kernel/time.c
index 6430a75..b06f759 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -92,7 +92,10 @@ SYSCALL_DEFINE1(stime, time_t __user *, tptr)
 	if (err)
 		return err;
 
-	do_settimeofday(&tv);
+	err = do_settimeofday(&tv);
+	if (!err)
+		timerfd_clock_was_set(CLOCK_REALTIME);
+
 	return 0;
 }
 
@@ -177,7 +180,11 @@ int do_sys_settimeofday(const struct timespec *tv, const struct timezone *tz)
 		/* SMP safe, again the code in arch/foo/time.c should
 		 * globally block out interrupts when it runs.
 		 */
-		return do_settimeofday(tv);
+		error = do_settimeofday(tv);
+		if (!error)
+			timerfd_clock_was_set(CLOCK_REALTIME);
+
+		return error;
 	}
 	return 0;
 }
-- 
1.7.2.1.45.gb66c2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found] ` <1299681411-9227-1-git-send-email-virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org>
@ 2011-03-10  0:25   ` Andrew Morton
       [not found]     ` <20110309162513.5058c824.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2011-03-10  8:10     ` Alexander Shishkin
  0 siblings, 2 replies; 23+ messages in thread
From: Andrew Morton @ 2011-03-10  0:25 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ken MacLeod, Shaun Reich,
	Thomas Gleixner, Alexander Viro, Greg Kroah-Hartman, Feng Tang,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kay Sievers, Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

On Wed,  9 Mar 2011 16:36:51 +0200
Alexander Shishkin <virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org> wrote:

> Changes since v3:
>  - changed timerfd_settime() semantics (see below)
> Changes since v2:
>  - replaced sysfs interface with a syscall
>  - added sysctl/procfs handle to set a limit to the number of users
>  - fixed issues pointed out by Greg.
> Changes since v1:
>  - updated against 2.6.36-rc1,
>  - added notification/filtering options,
>  - added Documentation/ABI/sysfs-kernel-time-notify interface description.
> 
> Certain userspace applications (like "clock" desktop applets or cron or
> systemd) might want to be notified when some other application changes
> the system time. There are several known to me reasons for this:
>  - avoiding periodic wakeups to poll time changes;
>  - rearming CLOCK_REALTIME timers when said changes happen;
>  - changing system timekeeping policy for system-wide time management
>    programs;
>  - keeping guest applications/operating systems running in emulators
>    up to date;
>  - recalculation of traffic signal cycles in Advanced Traffic Controllers
>    (ATC), which is part of ATC API requirements [1] as developed by ATC
>    working group of the U.S. Institute of Transportation Engineers (ITE).
> 
> The major change since the previous version is the new semantics of
> timerfd_settime() when it's called on a time change notification
> descriptor: it will set the system time to utmr.it_value if the time
> change counter is zero, otherwise it will return EBUSY, this is required
> to prevent a race between setting the time and reading the counter, when
> the time controlling procees changes the time immediately after another
> process in the system did the same (the counter is greater than one),
> that process' time change will be lost. Thus, the time controlling
> process should use timerfd_settime() instead of clock_settime() or
> settimeofday() to ensure that other processes' time changes don't get
> lost.
> 
> This is another attempt to approach notifying userspace about system
> clock changes. The other one is using an eventfd and a syscall [1]. In
> the course of discussing the necessity of a syscall for this kind of
> notifications, it was suggested that this functionality can be achieved
> via timers [2] (and timerfd in particular [3]). This idea got quite
> some support [4], [5], [6] and some vague criticism [7], so I decided
> to try and go a bit further with it.

Looks sane to me.  The changelog is a bit of a pickle - I stole a
useful-looking paragraph from your other please-ignore-this-email
email.

I also cc'ed Michael and linux-api: there isn't much point in adding
kernel features if we don't tell anyone about them.

It would be helpful to know if the identified users of this feature
actually find it useful and adequate.  I guess the most common
application is the 1,001 desktop clock widgets.  Do you have any
feedback from any of the owners of those?

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found]     ` <20110309162513.5058c824.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2011-03-10  0:36       ` Kay Sievers
  2011-03-10  8:19         ` Alexander Shishkin
  2011-03-10  9:08         ` Thomas Gleixner
  2011-03-10  2:01       ` Scott James Remnant
  1 sibling, 2 replies; 23+ messages in thread
From: Kay Sievers @ 2011-03-10  0:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Shishkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Ken MacLeod, Shaun Reich, Thomas Gleixner, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Michael Tokarev, Marcelo Tosatti,
	John Stultz, Chris Friesen, Kirill A. Shutemov, Artem Bityutskiy,
	Davide Libenzi, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

On Thu, Mar 10, 2011 at 01:25, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Wed,  9 Mar 2011 16:36:51 +0200
> Alexander Shishkin <virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org> wrote:
>
>> Changes since v3:
>>  - changed timerfd_settime() semantics (see below)
>> Changes since v2:
>>  - replaced sysfs interface with a syscall
>>  - added sysctl/procfs handle to set a limit to the number of users
>>  - fixed issues pointed out by Greg.
>> Changes since v1:
>>  - updated against 2.6.36-rc1,
>>  - added notification/filtering options,
>>  - added Documentation/ABI/sysfs-kernel-time-notify interface description.

> It would be helpful to know if the identified users of this feature
> actually find it useful and adequate.  I guess the most common
> application is the 1,001 desktop clock widgets.  Do you have any
> feedback from any of the owners of those?

We want it for systemd, to provide cron-like functionality but without
the need to stupidly wake up every minute and check the system time
for possible jumps.

It also sounds useful for a generic resume (after system suspend)
notification for applications, which isn't really possible today.

Kay

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found]     ` <20110309162513.5058c824.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2011-03-10  0:36       ` Kay Sievers
@ 2011-03-10  2:01       ` Scott James Remnant
       [not found]         ` <AANLkTin7=rK2Sc2D7aMhPnPWU9LJq7KMSvwxv_PHERcK-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 23+ messages in thread
From: Scott James Remnant @ 2011-03-10  2:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Shishkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Ken MacLeod, Shaun Reich, Thomas Gleixner, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Michael Tokarev, Marcelo Tosatti,
	John Stultz, Chris Friesen, Kay Sievers, Kirill A. Shutemov,
	Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

On Wed, Mar 9, 2011 at 4:25 PM, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3Sm6D+HspMUB@public.gmane.orgg> wrote:
> On Wed,  9 Mar 2011 16:36:51 +0200
> Alexander Shishkin <virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org> wrote:
>
>> Changes since v3:
>>  - changed timerfd_settime() semantics (see below)
>> Changes since v2:
>>  - replaced sysfs interface with a syscall
>>  - added sysctl/procfs handle to set a limit to the number of users
>>  - fixed issues pointed out by Greg.
>> Changes since v1:
>>  - updated against 2.6.36-rc1,
>>  - added notification/filtering options,
>>  - added Documentation/ABI/sysfs-kernel-time-notify interface description.
>>
>> Certain userspace applications (like "clock" desktop applets or cron or
>> systemd) might want to be notified when some other application changes
>> the system time. There are several known to me reasons for this:
>>  - avoiding periodic wakeups to poll time changes;
>>  - rearming CLOCK_REALTIME timers when said changes happen;
>>  - changing system timekeeping policy for system-wide time management
>>    programs;
>>  - keeping guest applications/operating systems running in emulators
>>    up to date;
>>  - recalculation of traffic signal cycles in Advanced Traffic Controllers
>>    (ATC), which is part of ATC API requirements [1] as developed by ATC
>>    working group of the U.S. Institute of Transportation Engineers (ITE).
>>
>> The major change since the previous version is the new semantics of
>> timerfd_settime() when it's called on a time change notification
>> descriptor: it will set the system time to utmr.it_value if the time
>> change counter is zero, otherwise it will return EBUSY, this is required
>> to prevent a race between setting the time and reading the counter, when
>> the time controlling procees changes the time immediately after another
>> process in the system did the same (the counter is greater than one),
>> that process' time change will be lost. Thus, the time controlling
>> process should use timerfd_settime() instead of clock_settime() or
>> settimeofday() to ensure that other processes' time changes don't get
>> lost.
>>
>> This is another attempt to approach notifying userspace about system
>> clock changes. The other one is using an eventfd and a syscall [1]. In
>> the course of discussing the necessity of a syscall for this kind of
>> notifications, it was suggested that this functionality can be achieved
>> via timers [2] (and timerfd in particular [3]). This idea got quite
>> some support [4], [5], [6] and some vague criticism [7], so I decided
>> to try and go a bit further with it.
>
> Looks sane to me.  The changelog is a bit of a pickle - I stole a
> useful-looking paragraph from your other please-ignore-this-email
> email.
>
> I also cc'ed Michael and linux-api: there isn't much point in adding
> kernel features if we don't tell anyone about them.
>
> It would be helpful to know if the identified users of this feature
> actually find it useful and adequate.  I guess the most common
> application is the 1,001 desktop clock widgets.  Do you have any
> feedback from any of the owners of those?
>
cron is another obvious one (or init systems attempting to replace
cron). Having to wakeup and check the time every minute can be
non-conducive to power savings, it would be better if we could just
sleep until the next alarm and be woken up if the time changes in
between.

(That being said, we also need to poll for and/or check for timezone
changes - but those are entirely userspace, so we can deal with that
separately)

Scott

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-09 14:36 [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes Alexander Shishkin
       [not found] ` <1299681411-9227-1-git-send-email-virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org>
@ 2011-03-10  8:02 ` Kirill A. Shutemov
  2011-03-10  8:15   ` Alexander Shishkin
  2011-03-10  8:48 ` Arnd Bergmann
  2011-03-10  9:52 ` Thomas Gleixner
  3 siblings, 1 reply; 23+ messages in thread
From: Kirill A. Shutemov @ 2011-03-10  8:02 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Thomas Gleixner,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Andrew Morton,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kay Sievers, Artem Bityutskiy, Davide Libenzi, linux-fsdevel

On Wed, Mar 09, 2011 at 04:36:51PM +0200, Alexander Shishkin wrote:
> Changes since v3:
>  - changed timerfd_settime() semantics (see below)
> Changes since v2:
>  - replaced sysfs interface with a syscall
>  - added sysctl/procfs handle to set a limit to the number of users
>  - fixed issues pointed out by Greg.
> Changes since v1:
>  - updated against 2.6.36-rc1,
>  - added notification/filtering options,
>  - added Documentation/ABI/sysfs-kernel-time-notify interface description.
> 
> Certain userspace applications (like "clock" desktop applets or cron or
> systemd) might want to be notified when some other application changes
> the system time. There are several known to me reasons for this:
>  - avoiding periodic wakeups to poll time changes;
>  - rearming CLOCK_REALTIME timers when said changes happen;
>  - changing system timekeeping policy for system-wide time management
>    programs;
>  - keeping guest applications/operating systems running in emulators
>    up to date;
>  - recalculation of traffic signal cycles in Advanced Traffic Controllers
>    (ATC), which is part of ATC API requirements [1] as developed by ATC
>    working group of the U.S. Institute of Transportation Engineers (ITE).
> 
> The major change since the previous version is the new semantics of
> timerfd_settime() when it's called on a time change notification
> descriptor: it will set the system time to utmr.it_value if the time
> change counter is zero, otherwise it will return EBUSY, this is required
> to prevent a race between setting the time and reading the counter, when
> the time controlling procees changes the time immediately after another
> process in the system did the same (the counter is greater than one),
> that process' time change will be lost. Thus, the time controlling
> process should use timerfd_settime() instead of clock_settime() or
> settimeofday() to ensure that other processes' time changes don't get
> lost.
> 
> This is another attempt to approach notifying userspace about system
> clock changes. The other one is using an eventfd and a syscall [1]. In
> the course of discussing the necessity of a syscall for this kind of
> notifications, it was suggested that this functionality can be achieved
> via timers [2] (and timerfd in particular [3]). This idea got quite
> some support [4], [5], [6] and some vague criticism [7], so I decided
> to try and go a bit further with it.
> 
> [1] http://www.ite.org/standards/atcapi/version2.asp
> [2] http://marc.info/?l=linux-kernel&m=128950389423614&w=2
> [3] http://marc.info/?l=linux-kernel&m=128951020831573&w=2
> [4] http://marc.info/?l=linux-kernel&m=128951588006157&w=2
> [5] http://marc.info/?l=linux-kernel&m=128951503205111&w=2
> [6] http://marc.info/?l=linux-kernel&m=128955890118477&w=2
> [7] http://marc.info/?l=linux-kernel&m=129002967031104&w=2
> [8] http://marc.info/?l=linux-kernel&m=129002672227263&w=2
> 
> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> CC: Greg Kroah-Hartman <gregkh@suse.de>
> CC: Feng Tang <feng.tang@intel.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Michael Tokarev <mjt@tls.msk.ru>
> CC: Marcelo Tosatti <mtosatti@redhat.com>
> CC: John Stultz <johnstul@us.ibm.com>
> CC: Chris Friesen <chris.friesen@genband.com>
> CC: Kay Sievers <kay.sievers@vrfy.org>
> CC: Kirill A. Shutemov <kirill@shutemov.name>
> CC: Artem Bityutskiy <dedekind1@gmail.com>
> CC: Davide Libenzi <davidel@xmailserver.org>
> CC: linux-fsdevel@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  fs/timerfd.c            |   94 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/hrtimer.h |    6 +++
>  include/linux/timerfd.h |    3 +-
>  kernel/compat.c         |    5 ++-
>  kernel/hrtimer.c        |    4 ++
>  kernel/time.c           |   11 ++++-
>  6 files changed, 109 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index 8c4fc14..6170f61 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -22,6 +22,7 @@
>  #include <linux/anon_inodes.h>
>  #include <linux/timerfd.h>
>  #include <linux/syscalls.h>
> +#include <linux/security.h>
>  
>  struct timerfd_ctx {
>  	struct hrtimer tmr;
> @@ -30,8 +31,13 @@ struct timerfd_ctx {
>  	u64 ticks;
>  	int expired;
>  	int clockid;
> +	struct list_head notifiers_list;
>  };
>  
> +/* TFD_NOTIFY_CLOCK_SET timers go here */
> +static DEFINE_SPINLOCK(notifiers_lock);
> +static LIST_HEAD(notifiers_list);
> +
>  /*
>   * This gets called when the timer event triggers. We set the "expired"
>   * flag, but we do not re-arm the timer (in case it's necessary,
> @@ -51,10 +57,31 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
>  	return HRTIMER_NORESTART;
>  }
>  
> +void timerfd_clock_was_set(clockid_t clockid)
> +{
> +	struct timerfd_ctx *ctx;
> +	unsigned long flags;
> +
> +	spin_lock(&notifiers_lock);
> +	list_for_each_entry(ctx, &notifiers_list, notifiers_list) {
> +		spin_lock_irqsave(&ctx->wqh.lock, flags);
> +		if (ctx->tmr.base->index == clockid) {
> +			ctx->ticks++;
> +			wake_up_locked(&ctx->wqh);
> +		}
> +		spin_unlock_irqrestore(&ctx->wqh.lock, flags);
> +	}
> +	spin_unlock(&notifiers_lock);
> +}
> +
>  static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx)
>  {
>  	ktime_t remaining;
>  
> +	/* for notification timers, return current time */
> +	if (!list_empty(&ctx->notifiers_list))
> +		return timespec_to_ktime(current_kernel_time());
> +
>  	remaining = hrtimer_expires_remaining(&ctx->tmr);
>  	return remaining.tv64 < 0 ? ktime_set(0, 0): remaining;
>  }
> @@ -72,6 +99,12 @@ static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
>  	ctx->expired = 0;
>  	ctx->ticks = 0;
>  	ctx->tintv = timespec_to_ktime(ktmr->it_interval);
> +
> +	if (flags & TFD_NOTIFY_CLOCK_SET) {
> +		list_add(&ctx->notifiers_list, &notifiers_list);
> +		return;
> +	}
> +
>  	hrtimer_init(&ctx->tmr, ctx->clockid, htmode);
>  	hrtimer_set_expires(&ctx->tmr, texp);
>  	ctx->tmr.function = timerfd_tmrproc;
> @@ -83,7 +116,12 @@ static int timerfd_release(struct inode *inode, struct file *file)
>  {
>  	struct timerfd_ctx *ctx = file->private_data;
>  
> -	hrtimer_cancel(&ctx->tmr);
> +	if (!list_empty(&ctx->notifiers_list)) {
> +		spin_lock(&notifiers_lock);
> +		list_del(&ctx->notifiers_list);
> +		spin_unlock(&notifiers_lock);
> +	} else
> +		hrtimer_cancel(&ctx->tmr);
>  	kfree(ctx);
>  	return 0;
>  }
> @@ -113,6 +151,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
>  
>  	if (count < sizeof(ticks))
>  		return -EINVAL;
> +
>  	spin_lock_irq(&ctx->wqh.lock);
>  	if (file->f_flags & O_NONBLOCK)
>  		res = -EAGAIN;

Whitespace changes?

> @@ -120,7 +159,8 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
>  		res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
>  	if (ctx->ticks) {
>  		ticks = ctx->ticks;
> -		if (ctx->expired && ctx->tintv.tv64) {
> +		if (ctx->expired && ctx->tintv.tv64 &&
> +		    list_empty(&ctx->notifiers_list)) {
>  			/*
>  			 * If tintv.tv64 != 0, this is a periodic timer that
>  			 * needs to be re-armed. We avoid doing it in the timer
> @@ -184,6 +224,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
>  	ctx->clockid = clockid;
>  	hrtimer_init(&ctx->tmr, clockid, HRTIMER_MODE_ABS);
>  
> +	INIT_LIST_HEAD(&ctx->notifiers_list);
> +
>  	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
>  			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
>  	if (ufd < 0)
> @@ -196,18 +238,24 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
>  		const struct itimerspec __user *, utmr,
>  		struct itimerspec __user *, otmr)
>  {
> +	int ret = 0;
>  	struct file *file;
>  	struct timerfd_ctx *ctx;
>  	struct itimerspec ktmr, kotmr;
>  
> -	if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> -		return -EFAULT;
> -
> -	if ((flags & ~TFD_SETTIME_FLAGS) ||
> -	    !timespec_valid(&ktmr.it_value) ||
> -	    !timespec_valid(&ktmr.it_interval))
> +	if (flags & ~TFD_SETTIME_FLAGS)
>  		return -EINVAL;
>  
> +	/* utmr may be NULL for notification timerfd */
> +	if (!(flags & TFD_NOTIFY_CLOCK_SET) || utmr) {
> +		if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> +			return -EFAULT;
> +
> +		if (!timespec_valid(&ktmr.it_value) ||
> +		    !timespec_valid(&ktmr.it_interval))
> +			return -EINVAL;
> +	}
> +
>  	file = timerfd_fget(ufd);
>  	if (IS_ERR(file))
>  		return PTR_ERR(file);
> @@ -218,10 +266,12 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
>  	 * it to the new values.
>  	 */
>  	for (;;) {
> +		spin_lock(&notifiers_lock);
>  		spin_lock_irq(&ctx->wqh.lock);
> -		if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> +		if (!list_empty(&notifiers_list) || hrtimer_try_to_cancel(&ctx->tmr) >= 0)
>  			break;
>  		spin_unlock_irq(&ctx->wqh.lock);
> +		spin_unlock(&notifiers_lock);
>  		cpu_relax();
>  	}
>  
> @@ -238,16 +288,39 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
>  	kotmr.it_interval = ktime_to_timespec(ctx->tintv);
>  
>  	/*
> +	 * for the notification timerfd, set current time to it_value
> +	 * if the timer hasn't expired; otherwise someone has changed
> +	 * the system time to the value that we don't know
> +	 */
> +	if (!list_empty(&ctx->notifiers_list) && utmr) {
> +		if (ctx->ticks) {
> +			ret = -EBUSY;
> +			goto out;
> +		}
> +
> +		ret = security_settime(&ktmr.it_value, NULL);
> +		if (ret)
> +			goto out;
> +
> +		spin_unlock_irq(&ctx->wqh.lock);
> +		ret = do_settimeofday(&ktmr.it_value);
> +		goto out1;
> +	}
> +
> +	/*
>  	 * Re-program the timer to the new value ...
>  	 */
>  	timerfd_setup(ctx, flags, &ktmr);
>  
> +out:
>  	spin_unlock_irq(&ctx->wqh.lock);
> +out1:
> +	spin_unlock(&notifiers_lock);
>  	fput(file);
>  	if (otmr && copy_to_user(otmr, &kotmr, sizeof(kotmr)))
>  		return -EFAULT;

What's reason to do copy_to_user() in case of error?
Is it safe for error from security_settime()?

> -	return 0;
> +	return ret;
>  }
>  
>  SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
> @@ -273,6 +346,7 @@ SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
>  	spin_unlock_irq(&ctx->wqh.lock);
>  	fput(file);
>  
> +out:
>  	return copy_to_user(otmr, &kotmr, sizeof(kotmr)) ? -EFAULT: 0;
>  }

This lable is not needed.

> diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> index 6bc1804..991e8d9 100644
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -248,6 +248,12 @@ static inline ktime_t hrtimer_expires_remaining(const struct hrtimer *timer)
>  	return ktime_sub(timer->node.expires, timer->base->get_time());
>  }
>  
> +#ifdef CONFIG_TIMERFD
> +extern void timerfd_clock_was_set(clockid_t clockid);
> +#else
> +static inline void timerfd_clock_was_set(clockid_t clockid) {}
> +#endif
> +
>  #ifdef CONFIG_HIGH_RES_TIMERS
>  struct clock_event_device;
>  
> diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h
> index 2d07929..c3ddad9 100644
> --- a/include/linux/timerfd.h
> +++ b/include/linux/timerfd.h
> @@ -19,6 +19,7 @@
>   * shared O_* flags.
>   */
>  #define TFD_TIMER_ABSTIME (1 << 0)
> +#define TFD_NOTIFY_CLOCK_SET (1 << 1)
>  #define TFD_CLOEXEC O_CLOEXEC
>  #define TFD_NONBLOCK O_NONBLOCK
>  
> @@ -26,6 +27,6 @@
>  /* Flags for timerfd_create.  */
>  #define TFD_CREATE_FLAGS TFD_SHARED_FCNTL_FLAGS
>  /* Flags for timerfd_settime.  */
> -#define TFD_SETTIME_FLAGS TFD_TIMER_ABSTIME
> +#define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_NOTIFY_CLOCK_SET)
>  
>  #endif /* _LINUX_TIMERFD_H */
> diff --git a/kernel/compat.c b/kernel/compat.c
> index 38b1d2c..b1cf3e1 100644
> --- a/kernel/compat.c
> +++ b/kernel/compat.c
> @@ -995,7 +995,10 @@ asmlinkage long compat_sys_stime(compat_time_t __user *tptr)
>  	if (err)
>  		return err;
>  
> -	do_settimeofday(&tv);
> +	err = do_settimeofday(&tv);
> +	if (!err)
> +		timerfd_clock_was_set(CLOCK_REALTIME);
> +
>  	return 0;
>  }
>  
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 2c3d6e5..469eef6 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -663,6 +663,7 @@ void clock_was_set(void)
>  {
>  	/* Retrigger the CPU local events everywhere */
>  	on_each_cpu(retrigger_next_event, NULL, 1);
> +
>  }
>  
>  /*

Whitespace changes?

> @@ -675,6 +676,9 @@ void hres_timers_resume(void)
>  		  KERN_INFO "hres_timers_resume() called with IRQs enabled!");
>  
>  	retrigger_next_event(NULL);
> +
> +	/* Trigger timerfd notifiers */
> +	timerfd_clock_was_set(CLOCK_MONOTONIC);
>  }
>  
>  /*
> diff --git a/kernel/time.c b/kernel/time.c
> index 6430a75..b06f759 100644
> --- a/kernel/time.c
> +++ b/kernel/time.c
> @@ -92,7 +92,10 @@ SYSCALL_DEFINE1(stime, time_t __user *, tptr)
>  	if (err)
>  		return err;
>  
> -	do_settimeofday(&tv);
> +	err = do_settimeofday(&tv);
> +	if (!err)
> +		timerfd_clock_was_set(CLOCK_REALTIME);
> +
>  	return 0;
>  }
>  
> @@ -177,7 +180,11 @@ int do_sys_settimeofday(const struct timespec *tv, const struct timezone *tz)
>  		/* SMP safe, again the code in arch/foo/time.c should
>  		 * globally block out interrupts when it runs.
>  		 */
> -		return do_settimeofday(tv);
> +		error = do_settimeofday(tv);
> +		if (!error)
> +			timerfd_clock_was_set(CLOCK_REALTIME);
> +
> +		return error;
>  	}
>  	return 0;
>  }
> -- 
> 1.7.2.1.45.gb66c2
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10  0:25   ` Andrew Morton
       [not found]     ` <20110309162513.5058c824.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2011-03-10  8:10     ` Alexander Shishkin
  1 sibling, 0 replies; 23+ messages in thread
From: Alexander Shishkin @ 2011-03-10  8:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Thomas Gleixner,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel, linux-api, Michael Kerrisk, Alexander Shishkin

On Wed, Mar 09, 2011 at 04:25:13PM -0800, Andrew Morton wrote:
> On Wed,  9 Mar 2011 16:36:51 +0200
> Alexander Shishkin <virtuoso@slind.org> wrote:
> 
> > Changes since v3:
> >  - changed timerfd_settime() semantics (see below)
> > Changes since v2:
> >  - replaced sysfs interface with a syscall
> >  - added sysctl/procfs handle to set a limit to the number of users
> >  - fixed issues pointed out by Greg.
> > Changes since v1:
> >  - updated against 2.6.36-rc1,
> >  - added notification/filtering options,
> >  - added Documentation/ABI/sysfs-kernel-time-notify interface description.
> > 
> > Certain userspace applications (like "clock" desktop applets or cron or
> > systemd) might want to be notified when some other application changes
> > the system time. There are several known to me reasons for this:
> >  - avoiding periodic wakeups to poll time changes;
> >  - rearming CLOCK_REALTIME timers when said changes happen;
> >  - changing system timekeeping policy for system-wide time management
> >    programs;
> >  - keeping guest applications/operating systems running in emulators
> >    up to date;
> >  - recalculation of traffic signal cycles in Advanced Traffic Controllers
> >    (ATC), which is part of ATC API requirements [1] as developed by ATC
> >    working group of the U.S. Institute of Transportation Engineers (ITE).
> > 
> > The major change since the previous version is the new semantics of
> > timerfd_settime() when it's called on a time change notification
> > descriptor: it will set the system time to utmr.it_value if the time
> > change counter is zero, otherwise it will return EBUSY, this is required
> > to prevent a race between setting the time and reading the counter, when
> > the time controlling procees changes the time immediately after another
> > process in the system did the same (the counter is greater than one),
> > that process' time change will be lost. Thus, the time controlling
> > process should use timerfd_settime() instead of clock_settime() or
> > settimeofday() to ensure that other processes' time changes don't get
> > lost.
> > 
> > This is another attempt to approach notifying userspace about system
> > clock changes. The other one is using an eventfd and a syscall [1]. In
> > the course of discussing the necessity of a syscall for this kind of
> > notifications, it was suggested that this functionality can be achieved
> > via timers [2] (and timerfd in particular [3]). This idea got quite
> > some support [4], [5], [6] and some vague criticism [7], so I decided
> > to try and go a bit further with it.
> 
> Looks sane to me.  The changelog is a bit of a pickle - I stole a
> useful-looking paragraph from your other please-ignore-this-email
> email.

I was trying to word it so that I wouldn't have to explain the same things
over and over again after each patch version. :) But your version looks
better.

> I also cc'ed Michael and linux-api: there isn't much point in adding
> kernel features if we don't tell anyone about them.

Yep, they somehow dropped out of my CC list in one of the previous
versions. I can provide a patch for man-pages, of course.

> It would be helpful to know if the identified users of this feature
> actually find it useful and adequate.  I guess the most common
> application is the 1,001 desktop clock widgets.  Do you have any
> feedback from any of the owners of those?

Shaun Reich (KDE) and Ken MacLeod (ITE) have contacted me recently asking
about my progress with this feature. They, as well as other potential
users for this are in CC. Their feedback was how I came up with the list
of usecases above.

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10  8:02 ` Kirill A. Shutemov
@ 2011-03-10  8:15   ` Alexander Shishkin
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Shishkin @ 2011-03-10  8:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Thomas Gleixner,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Andrew Morton,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kay Sievers, Artem Bityutskiy, Davide Libenzi, linux-fsdevel,
	Alexander Shishkin

On Thu, Mar 10, 2011 at 10:02:51AM +0200, Kirill A. Shutemov wrote:
> On Wed, Mar 09, 2011 at 04:36:51PM +0200, Alexander Shishkin wrote:
> > Changes since v3:
> >  - changed timerfd_settime() semantics (see below)
> > Changes since v2:
> >  - replaced sysfs interface with a syscall
> >  - added sysctl/procfs handle to set a limit to the number of users
> >  - fixed issues pointed out by Greg.
> > Changes since v1:
> >  - updated against 2.6.36-rc1,
> >  - added notification/filtering options,
> >  - added Documentation/ABI/sysfs-kernel-time-notify interface description.
> > 
> > Certain userspace applications (like "clock" desktop applets or cron or
> > systemd) might want to be notified when some other application changes
> > the system time. There are several known to me reasons for this:
> >  - avoiding periodic wakeups to poll time changes;
> >  - rearming CLOCK_REALTIME timers when said changes happen;
> >  - changing system timekeeping policy for system-wide time management
> >    programs;
> >  - keeping guest applications/operating systems running in emulators
> >    up to date;
> >  - recalculation of traffic signal cycles in Advanced Traffic Controllers
> >    (ATC), which is part of ATC API requirements [1] as developed by ATC
> >    working group of the U.S. Institute of Transportation Engineers (ITE).
> > 
> > The major change since the previous version is the new semantics of
> > timerfd_settime() when it's called on a time change notification
> > descriptor: it will set the system time to utmr.it_value if the time
> > change counter is zero, otherwise it will return EBUSY, this is required
> > to prevent a race between setting the time and reading the counter, when
> > the time controlling procees changes the time immediately after another
> > process in the system did the same (the counter is greater than one),
> > that process' time change will be lost. Thus, the time controlling
> > process should use timerfd_settime() instead of clock_settime() or
> > settimeofday() to ensure that other processes' time changes don't get
> > lost.
> > 
> > This is another attempt to approach notifying userspace about system
> > clock changes. The other one is using an eventfd and a syscall [1]. In
> > the course of discussing the necessity of a syscall for this kind of
> > notifications, it was suggested that this functionality can be achieved
> > via timers [2] (and timerfd in particular [3]). This idea got quite
> > some support [4], [5], [6] and some vague criticism [7], so I decided
> > to try and go a bit further with it.
> > 
> > [1] http://www.ite.org/standards/atcapi/version2.asp
> > [2] http://marc.info/?l=linux-kernel&m=128950389423614&w=2
> > [3] http://marc.info/?l=linux-kernel&m=128951020831573&w=2
> > [4] http://marc.info/?l=linux-kernel&m=128951588006157&w=2
> > [5] http://marc.info/?l=linux-kernel&m=128951503205111&w=2
> > [6] http://marc.info/?l=linux-kernel&m=128955890118477&w=2
> > [7] http://marc.info/?l=linux-kernel&m=129002967031104&w=2
> > [8] http://marc.info/?l=linux-kernel&m=129002672227263&w=2
> > 
> > Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> > CC: Thomas Gleixner <tglx@linutronix.de>
> > CC: Alexander Viro <viro@zeniv.linux.org.uk>
> > CC: Greg Kroah-Hartman <gregkh@suse.de>
> > CC: Feng Tang <feng.tang@intel.com>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Michael Tokarev <mjt@tls.msk.ru>
> > CC: Marcelo Tosatti <mtosatti@redhat.com>
> > CC: John Stultz <johnstul@us.ibm.com>
> > CC: Chris Friesen <chris.friesen@genband.com>
> > CC: Kay Sievers <kay.sievers@vrfy.org>
> > CC: Kirill A. Shutemov <kirill@shutemov.name>
> > CC: Artem Bityutskiy <dedekind1@gmail.com>
> > CC: Davide Libenzi <davidel@xmailserver.org>
> > CC: linux-fsdevel@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> > ---
> >  fs/timerfd.c            |   94 ++++++++++++++++++++++++++++++++++++++++++-----
> >  include/linux/hrtimer.h |    6 +++
> >  include/linux/timerfd.h |    3 +-
> >  kernel/compat.c         |    5 ++-
> >  kernel/hrtimer.c        |    4 ++
> >  kernel/time.c           |   11 ++++-
> >  6 files changed, 109 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/timerfd.c b/fs/timerfd.c
> > index 8c4fc14..6170f61 100644
> > --- a/fs/timerfd.c
> > +++ b/fs/timerfd.c
> > @@ -22,6 +22,7 @@
> >  #include <linux/anon_inodes.h>
> >  #include <linux/timerfd.h>
> >  #include <linux/syscalls.h>
> > +#include <linux/security.h>
> >  
> >  struct timerfd_ctx {
> >  	struct hrtimer tmr;
> > @@ -30,8 +31,13 @@ struct timerfd_ctx {
> >  	u64 ticks;
> >  	int expired;
> >  	int clockid;
> > +	struct list_head notifiers_list;
> >  };
> >  
> > +/* TFD_NOTIFY_CLOCK_SET timers go here */
> > +static DEFINE_SPINLOCK(notifiers_lock);
> > +static LIST_HEAD(notifiers_list);
> > +
> >  /*
> >   * This gets called when the timer event triggers. We set the "expired"
> >   * flag, but we do not re-arm the timer (in case it's necessary,
> > @@ -51,10 +57,31 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +void timerfd_clock_was_set(clockid_t clockid)
> > +{
> > +	struct timerfd_ctx *ctx;
> > +	unsigned long flags;
> > +
> > +	spin_lock(&notifiers_lock);
> > +	list_for_each_entry(ctx, &notifiers_list, notifiers_list) {
> > +		spin_lock_irqsave(&ctx->wqh.lock, flags);
> > +		if (ctx->tmr.base->index == clockid) {
> > +			ctx->ticks++;
> > +			wake_up_locked(&ctx->wqh);
> > +		}
> > +		spin_unlock_irqrestore(&ctx->wqh.lock, flags);
> > +	}
> > +	spin_unlock(&notifiers_lock);
> > +}
> > +
> >  static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx)
> >  {
> >  	ktime_t remaining;
> >  
> > +	/* for notification timers, return current time */
> > +	if (!list_empty(&ctx->notifiers_list))
> > +		return timespec_to_ktime(current_kernel_time());
> > +
> >  	remaining = hrtimer_expires_remaining(&ctx->tmr);
> >  	return remaining.tv64 < 0 ? ktime_set(0, 0): remaining;
> >  }
> > @@ -72,6 +99,12 @@ static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
> >  	ctx->expired = 0;
> >  	ctx->ticks = 0;
> >  	ctx->tintv = timespec_to_ktime(ktmr->it_interval);
> > +
> > +	if (flags & TFD_NOTIFY_CLOCK_SET) {
> > +		list_add(&ctx->notifiers_list, &notifiers_list);
> > +		return;
> > +	}
> > +
> >  	hrtimer_init(&ctx->tmr, ctx->clockid, htmode);
> >  	hrtimer_set_expires(&ctx->tmr, texp);
> >  	ctx->tmr.function = timerfd_tmrproc;
> > @@ -83,7 +116,12 @@ static int timerfd_release(struct inode *inode, struct file *file)
> >  {
> >  	struct timerfd_ctx *ctx = file->private_data;
> >  
> > -	hrtimer_cancel(&ctx->tmr);
> > +	if (!list_empty(&ctx->notifiers_list)) {
> > +		spin_lock(&notifiers_lock);
> > +		list_del(&ctx->notifiers_list);
> > +		spin_unlock(&notifiers_lock);
> > +	} else
> > +		hrtimer_cancel(&ctx->tmr);
> >  	kfree(ctx);
> >  	return 0;
> >  }
> > @@ -113,6 +151,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
> >  
> >  	if (count < sizeof(ticks))
> >  		return -EINVAL;
> > +
> >  	spin_lock_irq(&ctx->wqh.lock);
> >  	if (file->f_flags & O_NONBLOCK)
> >  		res = -EAGAIN;
> 
> Whitespace changes?
> 
> > @@ -120,7 +159,8 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
> >  		res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
> >  	if (ctx->ticks) {
> >  		ticks = ctx->ticks;
> > -		if (ctx->expired && ctx->tintv.tv64) {
> > +		if (ctx->expired && ctx->tintv.tv64 &&
> > +		    list_empty(&ctx->notifiers_list)) {
> >  			/*
> >  			 * If tintv.tv64 != 0, this is a periodic timer that
> >  			 * needs to be re-armed. We avoid doing it in the timer
> > @@ -184,6 +224,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
> >  	ctx->clockid = clockid;
> >  	hrtimer_init(&ctx->tmr, clockid, HRTIMER_MODE_ABS);
> >  
> > +	INIT_LIST_HEAD(&ctx->notifiers_list);
> > +
> >  	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
> >  			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
> >  	if (ufd < 0)
> > @@ -196,18 +238,24 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
> >  		const struct itimerspec __user *, utmr,
> >  		struct itimerspec __user *, otmr)
> >  {
> > +	int ret = 0;
> >  	struct file *file;
> >  	struct timerfd_ctx *ctx;
> >  	struct itimerspec ktmr, kotmr;
> >  
> > -	if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> > -		return -EFAULT;
> > -
> > -	if ((flags & ~TFD_SETTIME_FLAGS) ||
> > -	    !timespec_valid(&ktmr.it_value) ||
> > -	    !timespec_valid(&ktmr.it_interval))
> > +	if (flags & ~TFD_SETTIME_FLAGS)
> >  		return -EINVAL;
> >  
> > +	/* utmr may be NULL for notification timerfd */
> > +	if (!(flags & TFD_NOTIFY_CLOCK_SET) || utmr) {
> > +		if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> > +			return -EFAULT;
> > +
> > +		if (!timespec_valid(&ktmr.it_value) ||
> > +		    !timespec_valid(&ktmr.it_interval))
> > +			return -EINVAL;
> > +	}
> > +
> >  	file = timerfd_fget(ufd);
> >  	if (IS_ERR(file))
> >  		return PTR_ERR(file);
> > @@ -218,10 +266,12 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
> >  	 * it to the new values.
> >  	 */
> >  	for (;;) {
> > +		spin_lock(&notifiers_lock);
> >  		spin_lock_irq(&ctx->wqh.lock);
> > -		if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> > +		if (!list_empty(&notifiers_list) || hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> >  			break;
> >  		spin_unlock_irq(&ctx->wqh.lock);
> > +		spin_unlock(&notifiers_lock);
> >  		cpu_relax();
> >  	}
> >  
> > @@ -238,16 +288,39 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
> >  	kotmr.it_interval = ktime_to_timespec(ctx->tintv);
> >  
> >  	/*
> > +	 * for the notification timerfd, set current time to it_value
> > +	 * if the timer hasn't expired; otherwise someone has changed
> > +	 * the system time to the value that we don't know
> > +	 */
> > +	if (!list_empty(&ctx->notifiers_list) && utmr) {
> > +		if (ctx->ticks) {
> > +			ret = -EBUSY;
> > +			goto out;
> > +		}
> > +
> > +		ret = security_settime(&ktmr.it_value, NULL);
> > +		if (ret)
> > +			goto out;
> > +
> > +		spin_unlock_irq(&ctx->wqh.lock);
> > +		ret = do_settimeofday(&ktmr.it_value);
> > +		goto out1;
> > +	}
> > +
> > +	/*
> >  	 * Re-program the timer to the new value ...
> >  	 */
> >  	timerfd_setup(ctx, flags, &ktmr);
> >  
> > +out:
> >  	spin_unlock_irq(&ctx->wqh.lock);
> > +out1:
> > +	spin_unlock(&notifiers_lock);
> >  	fput(file);
> >  	if (otmr && copy_to_user(otmr, &kotmr, sizeof(kotmr)))
> >  		return -EFAULT;
> 
> What's reason to do copy_to_user() in case of error?

That's a separate issue. security_settime() checks for the capability for
_setting_ the time. This bit is returning the previously effective time
back to user if he/she asked for it.

> Is it safe for error from security_settime()?

The user can still obtain system time by other means.

> 
> > -	return 0;
> > +	return ret;
> >  }
> >  
> >  SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
> > @@ -273,6 +346,7 @@ SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
> >  	spin_unlock_irq(&ctx->wqh.lock);
> >  	fput(file);
> >  
> > +out:
> >  	return copy_to_user(otmr, &kotmr, sizeof(kotmr)) ? -EFAULT: 0;
> >  }
> 
> This lable is not needed.
> 
> > diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> > index 6bc1804..991e8d9 100644
> > --- a/include/linux/hrtimer.h
> > +++ b/include/linux/hrtimer.h
> > @@ -248,6 +248,12 @@ static inline ktime_t hrtimer_expires_remaining(const struct hrtimer *timer)
> >  	return ktime_sub(timer->node.expires, timer->base->get_time());
> >  }
> >  
> > +#ifdef CONFIG_TIMERFD
> > +extern void timerfd_clock_was_set(clockid_t clockid);
> > +#else
> > +static inline void timerfd_clock_was_set(clockid_t clockid) {}
> > +#endif
> > +
> >  #ifdef CONFIG_HIGH_RES_TIMERS
> >  struct clock_event_device;
> >  
> > diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h
> > index 2d07929..c3ddad9 100644
> > --- a/include/linux/timerfd.h
> > +++ b/include/linux/timerfd.h
> > @@ -19,6 +19,7 @@
> >   * shared O_* flags.
> >   */
> >  #define TFD_TIMER_ABSTIME (1 << 0)
> > +#define TFD_NOTIFY_CLOCK_SET (1 << 1)
> >  #define TFD_CLOEXEC O_CLOEXEC
> >  #define TFD_NONBLOCK O_NONBLOCK
> >  
> > @@ -26,6 +27,6 @@
> >  /* Flags for timerfd_create.  */
> >  #define TFD_CREATE_FLAGS TFD_SHARED_FCNTL_FLAGS
> >  /* Flags for timerfd_settime.  */
> > -#define TFD_SETTIME_FLAGS TFD_TIMER_ABSTIME
> > +#define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_NOTIFY_CLOCK_SET)
> >  
> >  #endif /* _LINUX_TIMERFD_H */
> > diff --git a/kernel/compat.c b/kernel/compat.c
> > index 38b1d2c..b1cf3e1 100644
> > --- a/kernel/compat.c
> > +++ b/kernel/compat.c
> > @@ -995,7 +995,10 @@ asmlinkage long compat_sys_stime(compat_time_t __user *tptr)
> >  	if (err)
> >  		return err;
> >  
> > -	do_settimeofday(&tv);
> > +	err = do_settimeofday(&tv);
> > +	if (!err)
> > +		timerfd_clock_was_set(CLOCK_REALTIME);
> > +
> >  	return 0;
> >  }
> >  
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> > index 2c3d6e5..469eef6 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -663,6 +663,7 @@ void clock_was_set(void)
> >  {
> >  	/* Retrigger the CPU local events everywhere */
> >  	on_each_cpu(retrigger_next_event, NULL, 1);
> > +
> >  }
> >  
> >  /*
> 
> Whitespace changes?
> 
> > @@ -675,6 +676,9 @@ void hres_timers_resume(void)
> >  		  KERN_INFO "hres_timers_resume() called with IRQs enabled!");
> >  
> >  	retrigger_next_event(NULL);
> > +
> > +	/* Trigger timerfd notifiers */
> > +	timerfd_clock_was_set(CLOCK_MONOTONIC);
> >  }
> >  
> >  /*
> > diff --git a/kernel/time.c b/kernel/time.c
> > index 6430a75..b06f759 100644
> > --- a/kernel/time.c
> > +++ b/kernel/time.c
> > @@ -92,7 +92,10 @@ SYSCALL_DEFINE1(stime, time_t __user *, tptr)
> >  	if (err)
> >  		return err;
> >  
> > -	do_settimeofday(&tv);
> > +	err = do_settimeofday(&tv);
> > +	if (!err)
> > +		timerfd_clock_was_set(CLOCK_REALTIME);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -177,7 +180,11 @@ int do_sys_settimeofday(const struct timespec *tv, const struct timezone *tz)
> >  		/* SMP safe, again the code in arch/foo/time.c should
> >  		 * globally block out interrupts when it runs.
> >  		 */
> > -		return do_settimeofday(tv);
> > +		error = do_settimeofday(tv);
> > +		if (!error)
> > +			timerfd_clock_was_set(CLOCK_REALTIME);
> > +
> > +		return error;
> >  	}
> >  	return 0;
> >  }
> > -- 
> > 1.7.2.1.45.gb66c2
> > 
> 
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10  0:36       ` Kay Sievers
@ 2011-03-10  8:19         ` Alexander Shishkin
  2011-03-10  9:08         ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Alexander Shishkin @ 2011-03-10  8:19 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Andrew Morton, linux-kernel, Ken MacLeod, Shaun Reich,
	Thomas Gleixner, Alexander Viro, Greg Kroah-Hartman, Feng Tang,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel, linux-api, Michael Kerrisk, Alexander Shishkin

On Thu, Mar 10, 2011 at 01:36:18AM +0100, Kay Sievers wrote:
> On Thu, Mar 10, 2011 at 01:25, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Wed,  9 Mar 2011 16:36:51 +0200
> > Alexander Shishkin <virtuoso@slind.org> wrote:
> >
> >> Changes since v3:
> >>  - changed timerfd_settime() semantics (see below)
> >> Changes since v2:
> >>  - replaced sysfs interface with a syscall
> >>  - added sysctl/procfs handle to set a limit to the number of users
> >>  - fixed issues pointed out by Greg.
> >> Changes since v1:
> >>  - updated against 2.6.36-rc1,
> >>  - added notification/filtering options,
> >>  - added Documentation/ABI/sysfs-kernel-time-notify interface description.
> 
> > It would be helpful to know if the identified users of this feature
> > actually find it useful and adequate.  I guess the most common
> > application is the 1,001 desktop clock widgets.  Do you have any
> > feedback from any of the owners of those?
> 
> We want it for systemd, to provide cron-like functionality but without
> the need to stupidly wake up every minute and check the system time
> for possible jumps.
> 
> It also sounds useful for a generic resume (after system suspend)
> notification for applications, which isn't really possible today.

Yes, but with John's CLOCK_BOOTTIME patches (which are in the tip tree
currently) it will be.

Regards,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found]         ` <AANLkTin7=rK2Sc2D7aMhPnPWU9LJq7KMSvwxv_PHERcK-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-03-10  8:25           ` Andrew Morton
       [not found]             ` <20110310002534.f984f8b2.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2011-03-10  8:25 UTC (permalink / raw)
  To: Scott James Remnant
  Cc: Alexander Shishkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Ken MacLeod, Shaun Reich, Thomas Gleixner, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Michael Tokarev, Marcelo Tosatti,
	John Stultz, Chris Friesen, Kay Sievers, Kirill A. Shutemov,
	Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

On Wed, 9 Mar 2011 18:01:09 -0800 Scott James Remnant <scott-Umf49k1wg4FWk0Htik3J/w@public.gmane.org> wrote:

> > It would be helpful to know if the identified users of this feature
> > actually find it useful and adequate. __I guess the most common
> > application is the 1,001 desktop clock widgets. __Do you have any
> > feedback from any of the owners of those?
> >
> cron is another obvious one (or init systems attempting to replace
> cron). Having to wakeup and check the time every minute can be
> non-conducive to power savings, it would be better if we could just
> sleep until the next alarm and be woken up if the time changes in
> between.
> 
> (That being said, we also need to poll for and/or check for timezone
> changes - but those are entirely userspace, so we can deal with that
> separately)

Sure, there will be lots of applications.

But what I'm asking isn't "it is a good feature".  I'm asking "is the
feature implemented well".  Ideally someone would get down and modify
cron to use the interface in this patch.

That's a bit of work (although not a huge amount) and a compromise
would be for app owners to sit down for half an hour and work through
their code and the kernel interface and let us know whether they found
the interface to be good and complete.

Because it would be most regrettable if we were to roll this feature
out and then three months later its users come back saying "what a
shame you didn't do it *this* way".

Also...  Alexander, I assume you have a userspace test app?  Please
send it and we'll add it to the changelog for testers, as example code
for implementers and as an additional way of reviewing the proposed
interface.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-09 14:36 [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes Alexander Shishkin
       [not found] ` <1299681411-9227-1-git-send-email-virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org>
  2011-03-10  8:02 ` Kirill A. Shutemov
@ 2011-03-10  8:48 ` Arnd Bergmann
  2011-03-10 14:19   ` Alexander Shishkin
  2011-03-10  9:52 ` Thomas Gleixner
  3 siblings, 1 reply; 23+ messages in thread
From: Arnd Bergmann @ 2011-03-10  8:48 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Thomas Gleixner,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Andrew Morton,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kay Sievers, Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel

On Wednesday 09 March 2011 15:36:51 Alexander Shishkin wrote:
> This is another attempt to approach notifying userspace about system
> clock changes. The other one is using an eventfd and a syscall [1]. In
> the course of discussing the necessity of a syscall for this kind of
> notifications, it was suggested that this functionality can be achieved
> via timers [2] (and timerfd in particular [3]). This idea got quite
> some support [4], [5], [6] and some vague criticism [7], so I decided
> to try and go a bit further with it.

I don't understand from your description or from the patch how user
space gets notified. From your description, I would have expected
to see a change to the timerfd_poll() function to check if the
clock has changed, possibly returning POLLPRI, but the only such
change I can see is in the timerfd_read() function. Don't you need
to change poll() so that a task knows when to call read()?

> +/* TFD_NOTIFY_CLOCK_SET timers go here */
> +static DEFINE_SPINLOCK(notifiers_lock);
> +static LIST_HEAD(notifiers_list);

Maybe I was the only one to be confused by this, but I think t
he naming is slightly misleading, because it's easy to confuse
with a struct notifier_block. You could of course use the
notifier API instead of building your own, but if you don't,
I'd recommend coming up with a different name.

I also think that a mutex would be better here than a spinlock.
It's unlikely to be under heavy contention, but if you have
a lot of threads, it could be held for a significant amount
of time.

	Arnd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10  0:36       ` Kay Sievers
  2011-03-10  8:19         ` Alexander Shishkin
@ 2011-03-10  9:08         ` Thomas Gleixner
       [not found]           ` <alpine.LFD.2.00.1103101004540.2787-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2011-03-10  9:08 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Andrew Morton, Alexander Shishkin, linux-kernel, Ken MacLeod,
	Shaun Reich, Alexander Viro, Greg Kroah-Hartman, Feng Tang,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel, linux-api, Michael Kerrisk

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1588 bytes --]

On Thu, 10 Mar 2011, Kay Sievers wrote:

> On Thu, Mar 10, 2011 at 01:25, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Wed,  9 Mar 2011 16:36:51 +0200
> > Alexander Shishkin <virtuoso@slind.org> wrote:
> >
> >> Changes since v3:
> >>  - changed timerfd_settime() semantics (see below)
> >> Changes since v2:
> >>  - replaced sysfs interface with a syscall
> >>  - added sysctl/procfs handle to set a limit to the number of users
> >>  - fixed issues pointed out by Greg.
> >> Changes since v1:
> >>  - updated against 2.6.36-rc1,
> >>  - added notification/filtering options,
> >>  - added Documentation/ABI/sysfs-kernel-time-notify interface description.
> 
> > It would be helpful to know if the identified users of this feature
> > actually find it useful and adequate.  I guess the most common
> > application is the 1,001 desktop clock widgets.  Do you have any
> > feedback from any of the owners of those?
> 
> We want it for systemd, to provide cron-like functionality but without
> the need to stupidly wake up every minute and check the system time
> for possible jumps.
> 
> It also sounds useful for a generic resume (after system suspend)
> notification for applications, which isn't really possible today.

Note, that we have CLOCK_BOOTTIME pending for .39 which aims at the
same problem. It's basically CLOCK_MONOTONIC adjusted by the time we
were in suspend. So while CLOCK_MONOTONIC timers are not aware of the
time spent in suspend CLOCK_BOOTTIME timers are. The reason for
implementing CLOCK_BOOTTIME was basically the same problem.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-09 14:36 [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes Alexander Shishkin
                   ` (2 preceding siblings ...)
  2011-03-10  8:48 ` Arnd Bergmann
@ 2011-03-10  9:52 ` Thomas Gleixner
  2011-03-10 14:12   ` Alexander Shishkin
  3 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2011-03-10  9:52 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Andrew Morton, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel

On Wed, 9 Mar 2011, Alexander Shishkin wrote:
> The major change since the previous version is the new semantics of
> timerfd_settime() when it's called on a time change notification
> descriptor: it will set the system time to utmr.it_value if the time
> change counter is zero, otherwise it will return EBUSY, this is required
> to prevent a race between setting the time and reading the counter, when
> the time controlling procees changes the time immediately after another
> process in the system did the same (the counter is greater than one),
> that process' time change will be lost. Thus, the time controlling
> process should use timerfd_settime() instead of clock_settime() or
> settimeofday() to ensure that other processes' time changes don't get
> lost.

No, we really don't want to go there and invent another mechanism to
set the time.

>  	/*
> +	 * for the notification timerfd, set current time to it_value
> +	 * if the timer hasn't expired; otherwise someone has changed
> +	 * the system time to the value that we don't know
> +	 */
> +	if (!list_empty(&ctx->notifiers_list) && utmr) {
> +		if (ctx->ticks) {
> +			ret = -EBUSY;
> +			goto out;
> +		}
> +
> +		ret = security_settime(&ktmr.it_value, NULL);
> +		if (ret)
> +			goto out;
> +
> +		spin_unlock_irq(&ctx->wqh.lock);
> +		ret = do_settimeofday(&ktmr.it_value);
> +		goto out1;
> +	}

And how does that solve the problem of multiple processes using that
interface? Not at all. You moved the timer_fd_clock_was_set()
notification into the syscalls so you do not deadlock on the
notifier_lock when you call do_settimeofday() here. So if you have
multiple users of notification fd then they do not notice that you
changed the time here. That's a half thought hack, really.

And you start to overload timerfd in a way which is really horrible.
The proposed semantics of timerfd_settime() with utmr == NULL or utmr
!= NULL depending on the notification flag are so non obvious that Joe
user space programmer is doomed to fail.

The problem you want to solve is:

   Wakeup CLOCK_REALTIME timers which are not yet expired, when the
   time is set backward.

That's at least what you said you wanted to solve. I regret already
that I suggested adding that flag to timerfd, as it was only meant to
provide an interface which wakes a non expired timer whenever clock
was set.

The patch does something different. How is this related to the problem
you wanted to solve in the first place?

Can you please explain which problems you identified aside of the
initial one?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found]           ` <alpine.LFD.2.00.1103101004540.2787-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>
@ 2011-03-10 11:16             ` Jamie Lokier
       [not found]               ` <20110310111622.GH29234-yetKDKU6eevNLxjTenLetw@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Jamie Lokier @ 2011-03-10 11:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kay Sievers, Andrew Morton, Alexander Shishkin,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ken MacLeod, Shaun Reich,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kirill A. Shutemov,
	Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

Thomas Gleixner wrote:
> Note, that we have CLOCK_BOOTTIME pending for .39 which aims at the
> same problem. It's basically CLOCK_MONOTONIC adjusted by the time we
> were in suspend. So while CLOCK_MONOTONIC timers are not aware of the
> time spent in suspend CLOCK_BOOTTIME timers are. The reason for
> implementing CLOCK_BOOTTIME was basically the same problem.

I'm afraid for coherent distributed system problems,
I don't trust CLOCK_BOOTTIME.

What happens when the clock battery is flat?  (Some systems have
separate battery for the clock, and it's never changed or recharged).

What about systems that just don't have a hardware clock while
suspended, or the clock doesn't remember the current year reliably, or
it's handled by userspace not the kernel?

(I have a system here where the clock battery will eventually run
down, and which has a userspace-only hwclock driver)

What happens if user does suspend to disk and resumes the disk image
after they used a different OS for a while, which has meanwhile also
altered the clock?

Or suspend to disk on a VM followed by moving to a different VM host.

In general I trust CLOCK_BOOTTIME to be a reasonable measure of
elapsed time most of the time - but not reliable enough for
distributed systems (such as coherent caches) that need stricter
guarantees whatever the client hardware, or need to know when those
guarantees aren't met.

Whereas I'd trust an "something happened so recalibate" event that is
always triggered - provided it's not sent too early or too late
relative to clock measurements and timer queue reads.  I've yet to
check if these proposed timerfd events meet that criterion.

-- Jamie

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found]               ` <20110310111622.GH29234-yetKDKU6eevNLxjTenLetw@public.gmane.org>
@ 2011-03-10 11:41                 ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2011-03-10 11:41 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Kay Sievers, Andrew Morton, Alexander Shishkin,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ken MacLeod, Shaun Reich,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kirill A. Shutemov,
	Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

On Thu, 10 Mar 2011, Jamie Lokier wrote:
> Thomas Gleixner wrote:
> > Note, that we have CLOCK_BOOTTIME pending for .39 which aims at the
> > same problem. It's basically CLOCK_MONOTONIC adjusted by the time we
> > were in suspend. So while CLOCK_MONOTONIC timers are not aware of the
> > time spent in suspend CLOCK_BOOTTIME timers are. The reason for
> > implementing CLOCK_BOOTTIME was basically the same problem.
> 
> I'm afraid for coherent distributed system problems,
> I don't trust CLOCK_BOOTTIME.

That timerfd thing as proposed will not solve that either.

> What happens when the clock battery is flat?  (Some systems have
> separate battery for the clock, and it's never changed or recharged).

Then your timekeeping is hosed anyway, so that's the least of your
worries.

> What about systems that just don't have a hardware clock while
> suspended, or the clock doesn't remember the current year reliably, or
> it's handled by userspace not the kernel?
> 
> (I have a system here where the clock battery will eventually run
> down, and which has a userspace-only hwclock driver)

Ditto. And I really do not care about user space only drivers at all.
 
> What happens if user does suspend to disk and resumes the disk image
> after they used a different OS for a while, which has meanwhile also
> altered the clock?

Again, your timekeeping is busted.

> Or suspend to disk on a VM followed by moving to a different VM host.

If your hosts have a complete different notion of clock realtime or
provide complete different based RTC emulation, then you run into a
whole set of other problems.

> In general I trust CLOCK_BOOTTIME to be a reasonable measure of
> elapsed time most of the time - but not reliable enough for
> distributed systems (such as coherent caches) that need stricter
> guarantees whatever the client hardware, or need to know when those
> guarantees aren't met.
> 
> Whereas I'd trust an "something happened so recalibate" event that is
> always triggered - provided it's not sent too early or too late
> relative to clock measurements and timer queue reads.  I've yet to
> check if these proposed timerfd events meet that criterion.

Not at all. There is no guarantee, that the process waiting for that
timerfd notification will resume before any other process which might
be affected by such an event.

You try to square the circle, but that won't happen before pigs fly.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10  9:52 ` Thomas Gleixner
@ 2011-03-10 14:12   ` Alexander Shishkin
  2011-03-10 14:55     ` Thomas Gleixner
  2011-03-10 21:57     ` Thomas Gleixner
  0 siblings, 2 replies; 23+ messages in thread
From: Alexander Shishkin @ 2011-03-10 14:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Andrew Morton, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel, Alexander Shishkin

On Thu, Mar 10, 2011 at 10:52:18AM +0100, Thomas Gleixner wrote:
> On Wed, 9 Mar 2011, Alexander Shishkin wrote:
> > The major change since the previous version is the new semantics of
> > timerfd_settime() when it's called on a time change notification
> > descriptor: it will set the system time to utmr.it_value if the time
> > change counter is zero, otherwise it will return EBUSY, this is required
> > to prevent a race between setting the time and reading the counter, when
> > the time controlling procees changes the time immediately after another
> > process in the system did the same (the counter is greater than one),
> > that process' time change will be lost. Thus, the time controlling
> > process should use timerfd_settime() instead of clock_settime() or
> > settimeofday() to ensure that other processes' time changes don't get
> > lost.
> 
> No, we really don't want to go there and invent another mechanism to
> set the time.
> 
> >  	/*
> > +	 * for the notification timerfd, set current time to it_value
> > +	 * if the timer hasn't expired; otherwise someone has changed
> > +	 * the system time to the value that we don't know
> > +	 */
> > +	if (!list_empty(&ctx->notifiers_list) && utmr) {
> > +		if (ctx->ticks) {
> > +			ret = -EBUSY;
> > +			goto out;
> > +		}
> > +
> > +		ret = security_settime(&ktmr.it_value, NULL);
> > +		if (ret)
> > +			goto out;
> > +
> > +		spin_unlock_irq(&ctx->wqh.lock);
> > +		ret = do_settimeofday(&ktmr.it_value);
> > +		goto out1;
> > +	}
> 
> And how does that solve the problem of multiple processes using that
> interface? Not at all. You moved the timer_fd_clock_was_set()
> notification into the syscalls so you do not deadlock on the
> notifier_lock when you call do_settimeofday() here. So if you have
> multiple users of notification fd then they do not notice that you
> changed the time here. That's a half thought hack, really.

Indeed, you're right here.

> And you start to overload timerfd in a way which is really horrible.
> The proposed semantics of timerfd_settime() with utmr == NULL or utmr
> != NULL depending on the notification flag are so non obvious that Joe
> user space programmer is doomed to fail.
> 
> The problem you want to solve is:
> 
>    Wakeup CLOCK_REALTIME timers which are not yet expired, when the
>    time is set backward.

"...when the time is set", yes.

> That's at least what you said you wanted to solve. I regret already
> that I suggested adding that flag to timerfd, as it was only meant to
> provide an interface which wakes a non expired timer whenever clock
> was set.

Yes. Except for in our usecase here and other usecases listed in the patch
description, there doesn't necessarily have to be a timer set to expire in
future. In some cases programs simply want to be notified when the time
changes. However, systemd or crond wouldn't (or shouldn't, in any case)
really care about time changes unless they have scheduled tasks.

I'm not sure it's worth it to always start a timer in order to get these
notifications. On the other hand, it fits much better in the timer/timerfd
interface than what I currently have.

> The patch does something different. How is this related to the problem
> you wanted to solve in the first place?

Well, if you scratch the timerfd_settime() bit, it kind of addresses the
initial problem. The timerfd_settime() was indeed a mistake.

> Can you please explain which problems you identified aside of the
> initial one?

Sure. The time daemon that we have here has to stop automatic time updates
when some other program changes system time *and* keep that setting
effective. Currently, when "the other program" changes the system time
right before time daemon changes it, this time setting will be overwritten
and lost. I'm thinking that it could be solved with something like

  clock_swaptime(clockid, new_timespec, old_timespec);

but something tells me that it will not be welcome either.

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10  8:48 ` Arnd Bergmann
@ 2011-03-10 14:19   ` Alexander Shishkin
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Shishkin @ 2011-03-10 14:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Thomas Gleixner,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Andrew Morton,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kay Sievers, Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel, Alexander Shishkin

On Thu, Mar 10, 2011 at 09:48:03AM +0100, Arnd Bergmann wrote:
> On Wednesday 09 March 2011 15:36:51 Alexander Shishkin wrote:
> > This is another attempt to approach notifying userspace about system
> > clock changes. The other one is using an eventfd and a syscall [1]. In
> > the course of discussing the necessity of a syscall for this kind of
> > notifications, it was suggested that this functionality can be achieved
> > via timers [2] (and timerfd in particular [3]). This idea got quite
> > some support [4], [5], [6] and some vague criticism [7], so I decided
> > to try and go a bit further with it.
> 
> I don't understand from your description or from the patch how user
> space gets notified. From your description, I would have expected
> to see a change to the timerfd_poll() function to check if the
> clock has changed, possibly returning POLLPRI, but the only such
> change I can see is in the timerfd_read() function. Don't you need
> to change poll() so that a task knows when to call read()?

Luckily, no changes for the _poll function were required, because the
notification code reuses the ctx->ticks counter of timerfds. IOW, poll
wakes up in the same way as before.

> > +/* TFD_NOTIFY_CLOCK_SET timers go here */
> > +static DEFINE_SPINLOCK(notifiers_lock);
> > +static LIST_HEAD(notifiers_list);
> 
> Maybe I was the only one to be confused by this, but I think t
> he naming is slightly misleading, because it's easy to confuse
> with a struct notifier_block. You could of course use the
> notifier API instead of building your own, but if you don't,
> I'd recommend coming up with a different name.

Point taken.

> I also think that a mutex would be better here than a spinlock.
> It's unlikely to be under heavy contention, but if you have
> a lot of threads, it could be held for a significant amount
> of time.

Indeed, thanks!

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10 14:12   ` Alexander Shishkin
@ 2011-03-10 14:55     ` Thomas Gleixner
  2011-03-10 15:43       ` Alexander Shishkin
  2011-03-10 21:57     ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2011-03-10 14:55 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Andrew Morton, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel

On Thu, 10 Mar 2011, Alexander Shishkin wrote:
> On Thu, Mar 10, 2011 at 10:52:18AM +0100, Thomas Gleixner wrote:
> > On Wed, 9 Mar 2011, Alexander Shishkin wrote:
> > The patch does something different. How is this related to the problem
> > you wanted to solve in the first place?
> 
> Well, if you scratch the timerfd_settime() bit, it kind of addresses the
> initial problem. The timerfd_settime() was indeed a mistake.
> 
> > Can you please explain which problems you identified aside of the
> > initial one?
> 
> Sure. The time daemon that we have here has to stop automatic time updates
> when some other program changes system time *and* keep that setting
> effective. Currently, when "the other program" changes the system time
> right before time daemon changes it, this time setting will be overwritten
> and lost. I'm thinking that it could be solved with something like
> 
>   clock_swaptime(clockid, new_timespec, old_timespec);
> 
> but something tells me that it will not be welcome either.

What's that time daemon doing? The semantics of updating system time,
but stopping to do so when something else sets the time sounds more
like a design problem than anything else.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10 14:55     ` Thomas Gleixner
@ 2011-03-10 15:43       ` Alexander Shishkin
  2011-03-10 16:40         ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Shishkin @ 2011-03-10 15:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Andrew Morton, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel, Alexander Shishkin

On Thu, Mar 10, 2011 at 03:55:00PM +0100, Thomas Gleixner wrote:
> On Thu, 10 Mar 2011, Alexander Shishkin wrote:
> > On Thu, Mar 10, 2011 at 10:52:18AM +0100, Thomas Gleixner wrote:
> > > On Wed, 9 Mar 2011, Alexander Shishkin wrote:
> > > The patch does something different. How is this related to the problem
> > > you wanted to solve in the first place?
> > 
> > Well, if you scratch the timerfd_settime() bit, it kind of addresses the
> > initial problem. The timerfd_settime() was indeed a mistake.
> > 
> > > Can you please explain which problems you identified aside of the
> > > initial one?
> > 
> > Sure. The time daemon that we have here has to stop automatic time updates
> > when some other program changes system time *and* keep that setting
> > effective. Currently, when "the other program" changes the system time
> > right before time daemon changes it, this time setting will be overwritten
> > and lost. I'm thinking that it could be solved with something like
> > 
> >   clock_swaptime(clockid, new_timespec, old_timespec);
> > 
> > but something tells me that it will not be welcome either.
> 
> What's that time daemon doing? The semantics of updating system time,
> but stopping to do so when something else sets the time sounds more
> like a design problem than anything else.

The daemon's synchronizing system time with various sources like GSM base
stations, time servers etc, but only until something else touches the time
in the system, which would basically mean that the user has installed a
3rd-party application that's controlling system time or just called `date`.
I don't know all the reasons for this requirement, but it seems that not
losing time changes to a race is not a bad idea. Of course, if anyone cares.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10 15:43       ` Alexander Shishkin
@ 2011-03-10 16:40         ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2011-03-10 16:40 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Andrew Morton, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel

On Thu, 10 Mar 2011, Alexander Shishkin wrote:
> On Thu, Mar 10, 2011 at 03:55:00PM +0100, Thomas Gleixner wrote:
> > On Thu, 10 Mar 2011, Alexander Shishkin wrote:
> > > On Thu, Mar 10, 2011 at 10:52:18AM +0100, Thomas Gleixner wrote:
> > > > On Wed, 9 Mar 2011, Alexander Shishkin wrote:
> > > > The patch does something different. How is this related to the problem
> > > > you wanted to solve in the first place?
> > > 
> > > Well, if you scratch the timerfd_settime() bit, it kind of addresses the
> > > initial problem. The timerfd_settime() was indeed a mistake.
> > > 
> > > > Can you please explain which problems you identified aside of the
> > > > initial one?
> > > 
> > > Sure. The time daemon that we have here has to stop automatic time updates
> > > when some other program changes system time *and* keep that setting
> > > effective. Currently, when "the other program" changes the system time
> > > right before time daemon changes it, this time setting will be overwritten
> > > and lost. I'm thinking that it could be solved with something like
> > > 
> > >   clock_swaptime(clockid, new_timespec, old_timespec);
> > > 
> > > but something tells me that it will not be welcome either.
> > 
> > What's that time daemon doing? The semantics of updating system time,
> > but stopping to do so when something else sets the time sounds more
> > like a design problem than anything else.
> 
> The daemon's synchronizing system time with various sources like GSM base
> stations, time servers etc, but only until something else touches the time
> in the system, which would basically mean that the user has installed a
> 3rd-party application that's controlling system time or just called `date`.

Well, having several different applications fiddling with
settimeofday() is not a good idea to begin with. If you have that,
then there is no way to avoid races or inconsistencies.

There is no restriction of issuing settimeofday() or clock_settime()
on any standard Linux system other than security_settime(). Which is
fine. When I have the permission to issue 'date -s' then I better know
that I'm screwing over whatever is responsible for maintaining time on
my machine. Same applies for installing applications which fiddle with
time. My package manager usually makes sure, that I don't install two
NTP daemons, but nothing prevents me to launch another one when I'm on
a root shell. If stuff explodes in my face, then I'm to blame nothing
else.

> I don't know all the reasons for this requirement, but it seems that not
> losing time changes to a race is not a bad idea. Of course, if anyone cares.

Unless I'm missing something then this requirement is based on the
wish to shorten the rope with which you can hang yourself, but fails
to make it short enough to prevent it. So what's the point?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-10 14:12   ` Alexander Shishkin
  2011-03-10 14:55     ` Thomas Gleixner
@ 2011-03-10 21:57     ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2011-03-10 21:57 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: linux-kernel, Ken MacLeod, Shaun Reich, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Andrew Morton, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel

On Thu, 10 Mar 2011, Alexander Shishkin wrote:
> On Thu, Mar 10, 2011 at 10:52:18AM +0100, Thomas Gleixner wrote:
> Sure. The time daemon that we have here has to stop automatic time updates
> when some other program changes system time *and* keep that setting
> effective. Currently, when "the other program" changes the system time
> right before time daemon changes it, this time setting will be overwritten
> and lost. I'm thinking that it could be solved with something like
> 
>   clock_swaptime(clockid, new_timespec, old_timespec);
> 
> but something tells me that it will not be welcome either.

Aside of that it wont work. You don't have a reference what
old_timespec means.

The whole problem space is full of race conditions and always will be
a horrible hackery when we try to piggy pack on clock_was_set() as we
have no idea what and when it actually happened. clock_was_set() is
async. While we can somehow get an event on a counter which tells us
that the clock was set, any attempt to return useful information aside
of the fact that the counter changed is going to be inconsistent one
way or the other.

It really takes some more to make this consistent for all the use
cases which are interested in notifications and unconditional timer
cancellation when the underlying clock was set.

After twisting my brain around the corner cases for a while I think
the only feasible approach to avoid all the lurking races is to:

1) Provide a syscall which returns the current offset of
   CLOCK_REALTIME vs. CLOCK_MONOTONIC. That offset is changed when
   CLOCK_REALTIME is set.

2) Provide a mechanism to check consistently the CLOCK_REALTIME
   vs. CLOCK_MONOTONIC offset and notify about changes.

3) Extend the clock_nanosleep() flags with TIMER_CANCEL_ON_CLOCK_SET

   When the flag is set, then the rmtp pointer, which is currently
   used to copy the remaining time to user space must contain a valid
   pointer to the previously retrieved CLOCK_REALTIME offset.

   clock_nanosleep() then checks that user space provided offset under
   #2 and hooks the caller into the notification mechanism. If the
   offset has changed before the timer is enqueued the syscall returns
   immediately with an appropriate error code. If the offset changes
   after the check, then an eventually enqueued timer will be
   cancelled and an appropriate error code returned.

   Note: This wont work for signal based timers as we have no sane way
   to notify user space about a forced cancellation of the timer. Even
   if we could think about some extra signal for this, it's not worth
   the trouble and the mess it's going to create.

4) Extend timerfd_settime() as #3 if necessary

   I'd prefer to avoid that, but I can see the charm of the poll
   facility which is provided by timerfd.

   Again we could reuse the omtr pointer of timerfd_settime() to
   provide the offset as an incoming parameter when the corresponing
   flag is set and basically do the same thing as clock_nanosleep() in
   the setup path - check the offset consistently.

   It needs some thought on the return values from poll and how to
   handle read, but that's a solvable problem as we can reasonably
   restrict this functionality to non self rearming timers.

That should solve the most urgent problem of cron alike battery
wasters. It also should be a reasonable notification mechanism for
others who are just interested in the fact that clock was set as those
can simply arm a timer which expires somewhere in the next decade. If
clock is not set within that time frame then battery life wont suffer
from that once in a decade regular timer expiry wakeup.

It's not going to solve the "stop updating time when something else
set the clock" requirement, but as I argued before there is no point
to even think about that at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found]             ` <20110310002534.f984f8b2.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2011-03-11 19:51               ` Scott James Remnant
       [not found]                 ` <AANLkTi=_DG8uYLxDnXF=so5daUE=w-pxL67mYaFAGkkL-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Scott James Remnant @ 2011-03-11 19:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Shishkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Ken MacLeod, Shaun Reich, Thomas Gleixner, Alexander Viro,
	Greg Kroah-Hartman, Feng Tang, Michael Tokarev, Marcelo Tosatti,
	John Stultz, Chris Friesen, Kay Sievers, Kirill A. Shutemov,
	Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

On Thu, Mar 10, 2011 at 12:25 AM, Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Wed, 9 Mar 2011 18:01:09 -0800 Scott James Remnant <scott@netsplit.com> wrote:
>
>> > It would be helpful to know if the identified users of this feature
>> > actually find it useful and adequate. __I guess the most common
>> > application is the 1,001 desktop clock widgets. __Do you have any
>> > feedback from any of the owners of those?
>> >
>> cron is another obvious one (or init systems attempting to replace
>> cron). Having to wakeup and check the time every minute can be
>> non-conducive to power savings, it would be better if we could just
>> sleep until the next alarm and be woken up if the time changes in
>> between.
>>
>> (That being said, we also need to poll for and/or check for timezone
>> changes - but those are entirely userspace, so we can deal with that
>> separately)
>
> Sure, there will be lots of applications.
>
> But what I'm asking isn't "it is a good feature".  I'm asking "is the
> feature implemented well".  Ideally someone would get down and modify
> cron to use the interface in this patch.
>
So I've just been thinking today - and I'm actually not sure whether
this is needed at all for this case.

A good cron implementation is going to set timers according to
CLOCK_REALTIME; in the case where the clock changes forwards, those
timers will fire as part of the clock changing already no? And in the
case where the clock changes backwards, you don't want to re-run old
ones anyway.

Even the hourly/daily cases are actually at a fixed time, so would be
triggered - and a decent implementation wouldn't trigger a given
script more than once.

I'm going to test this in userspace shortly to see whether it's the case.

Scott

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
       [not found]                 ` <AANLkTi=_DG8uYLxDnXF=so5daUE=w-pxL67mYaFAGkkL-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-03-11 19:56                   ` Thomas Gleixner
  2011-03-15  1:53                     ` Scott James Remnant
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2011-03-11 19:56 UTC (permalink / raw)
  To: Scott James Remnant
  Cc: Andrew Morton, Alexander Shishkin,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ken MacLeod, Shaun Reich,
	Alexander Viro, Greg Kroah-Hartman, Feng Tang, Michael Tokarev,
	Marcelo Tosatti, John Stultz, Chris Friesen, Kay Sievers,
	Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2023 bytes --]

On Fri, 11 Mar 2011, Scott James Remnant wrote:
> On Thu, Mar 10, 2011 at 12:25 AM, Andrew Morton
> <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> > On Wed, 9 Mar 2011 18:01:09 -0800 Scott James Remnant <scott-Umf49k1wg4FWk0Htik3J/w@public.gmane.org> wrote:
> >
> >> > It would be helpful to know if the identified users of this feature
> >> > actually find it useful and adequate. __I guess the most common
> >> > application is the 1,001 desktop clock widgets. __Do you have any
> >> > feedback from any of the owners of those?
> >> >
> >> cron is another obvious one (or init systems attempting to replace
> >> cron). Having to wakeup and check the time every minute can be
> >> non-conducive to power savings, it would be better if we could just
> >> sleep until the next alarm and be woken up if the time changes in
> >> between.
> >>
> >> (That being said, we also need to poll for and/or check for timezone
> >> changes - but those are entirely userspace, so we can deal with that
> >> separately)
> >
> > Sure, there will be lots of applications.
> >
> > But what I'm asking isn't "it is a good feature".  I'm asking "is the
> > feature implemented well".  Ideally someone would get down and modify
> > cron to use the interface in this patch.
> >
> So I've just been thinking today - and I'm actually not sure whether
> this is needed at all for this case.
> 
> A good cron implementation is going to set timers according to
> CLOCK_REALTIME; in the case where the clock changes forwards, those
> timers will fire as part of the clock changing already no? And in the
> case where the clock changes backwards, you don't want to re-run old
> ones anyway.
> 
> Even the hourly/daily cases are actually at a fixed time, so would be
> triggered - and a decent implementation wouldn't trigger a given
> script more than once.

Yeah, I was wondering about today as well. Though when you set back
your clock several days, stuff might be surprised if it's not woken up
for several days :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes
  2011-03-11 19:56                   ` Thomas Gleixner
@ 2011-03-15  1:53                     ` Scott James Remnant
  0 siblings, 0 replies; 23+ messages in thread
From: Scott James Remnant @ 2011-03-15  1:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Alexander Shishkin, linux-kernel, Ken MacLeod,
	Shaun Reich, Alexander Viro, Greg Kroah-Hartman, Feng Tang,
	Michael Tokarev, Marcelo Tosatti, John Stultz, Chris Friesen,
	Kay Sievers, Kirill A. Shutemov, Artem Bityutskiy, Davide Libenzi,
	linux-fsdevel, linux-api, Michael Kerrisk

On Fri, Mar 11, 2011 at 11:56 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 11 Mar 2011, Scott James Remnant wrote:
>> On Thu, Mar 10, 2011 at 12:25 AM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> > On Wed, 9 Mar 2011 18:01:09 -0800 Scott James Remnant <scott@netsplit.com> wrote:
>> >
>> >> > It would be helpful to know if the identified users of this feature
>> >> > actually find it useful and adequate. __I guess the most common
>> >> > application is the 1,001 desktop clock widgets. __Do you have any
>> >> > feedback from any of the owners of those?
>> >> >
>> >> cron is another obvious one (or init systems attempting to replace
>> >> cron). Having to wakeup and check the time every minute can be
>> >> non-conducive to power savings, it would be better if we could just
>> >> sleep until the next alarm and be woken up if the time changes in
>> >> between.
>> >>
>> >> (That being said, we also need to poll for and/or check for timezone
>> >> changes - but those are entirely userspace, so we can deal with that
>> >> separately)
>> >
>> > Sure, there will be lots of applications.
>> >
>> > But what I'm asking isn't "it is a good feature".  I'm asking "is the
>> > feature implemented well".  Ideally someone would get down and modify
>> > cron to use the interface in this patch.
>> >
>> So I've just been thinking today - and I'm actually not sure whether
>> this is needed at all for this case.
>>
>> A good cron implementation is going to set timers according to
>> CLOCK_REALTIME; in the case where the clock changes forwards, those
>> timers will fire as part of the clock changing already no? And in the
>> case where the clock changes backwards, you don't want to re-run old
>> ones anyway.
>>
>> Even the hourly/daily cases are actually at a fixed time, so would be
>> triggered - and a decent implementation wouldn't trigger a given
>> script more than once.
>
> Yeah, I was wondering about today as well. Though when you set back
> your clock several days, stuff might be surprised if it's not woken up
> for several days :)
>
I've checked the code, and more importantly, tested the
setting-forward example - timers do indeed fire at the point the clock
is wound forwards. This means there doesn't appear to be a utility for
this patch in the cron case.

In the wound back case, I believe that even current cron goes to some
effort to avoid firing events that have already happened?

Scott
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-03-15  1:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-09 14:36 [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes Alexander Shishkin
     [not found] ` <1299681411-9227-1-git-send-email-virtuoso-0lOfPCoBze7YtjvyW6yDsg@public.gmane.org>
2011-03-10  0:25   ` Andrew Morton
     [not found]     ` <20110309162513.5058c824.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2011-03-10  0:36       ` Kay Sievers
2011-03-10  8:19         ` Alexander Shishkin
2011-03-10  9:08         ` Thomas Gleixner
     [not found]           ` <alpine.LFD.2.00.1103101004540.2787-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>
2011-03-10 11:16             ` Jamie Lokier
     [not found]               ` <20110310111622.GH29234-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2011-03-10 11:41                 ` Thomas Gleixner
2011-03-10  2:01       ` Scott James Remnant
     [not found]         ` <AANLkTin7=rK2Sc2D7aMhPnPWU9LJq7KMSvwxv_PHERcK-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-03-10  8:25           ` Andrew Morton
     [not found]             ` <20110310002534.f984f8b2.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2011-03-11 19:51               ` Scott James Remnant
     [not found]                 ` <AANLkTi=_DG8uYLxDnXF=so5daUE=w-pxL67mYaFAGkkL-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-03-11 19:56                   ` Thomas Gleixner
2011-03-15  1:53                     ` Scott James Remnant
2011-03-10  8:10     ` Alexander Shishkin
2011-03-10  8:02 ` Kirill A. Shutemov
2011-03-10  8:15   ` Alexander Shishkin
2011-03-10  8:48 ` Arnd Bergmann
2011-03-10 14:19   ` Alexander Shishkin
2011-03-10  9:52 ` Thomas Gleixner
2011-03-10 14:12   ` Alexander Shishkin
2011-03-10 14:55     ` Thomas Gleixner
2011-03-10 15:43       ` Alexander Shishkin
2011-03-10 16:40         ` Thomas Gleixner
2011-03-10 21:57     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).