From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756135AbbFBJCd (ORCPT <rfc822;w@1wt.eu>);
	Tue, 2 Jun 2015 05:02:33 -0400
Received: from mail-wg0-f44.google.com ([74.125.82.44]:34335 "EHLO
	mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753808AbbFBJCA (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 2 Jun 2015 05:02:00 -0400
Date: Tue, 2 Jun 2015 11:01:54 +0200
From: Ingo Molnar <mingo@kernel.org>
To: John Stultz <john.stultz@linaro.org>
Cc: lkml <linux-kernel@vger.kernel.org>, Prarit Bhargava <prarit@redhat.com>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Richard Cochran <richardcochran@gmail.com>, Jan Kara <jack@suse.cz>,
        Jiri Bohac <jbohac@suse.cz>, Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Shuah Khan <shuahkh@osg.samsung.com>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime
 fastpaths
Message-ID: <20150602090154.GA2590@gmail.com>
References: <1432931068-4980-1-git-send-email-john.stultz@linaro.org>
 <1432931068-4980-5-git-send-email-john.stultz@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1432931068-4980-5-git-send-email-john.stultz@linaro.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


* John Stultz <john.stultz@linaro.org> wrote:

> Currently, leapsecond adjustments are done at tick time.
> 
> As a result, the leapsecond was applied at the first timer
> tick *after* the leapsecond (~1-10ms late depending on HZ),
> rather then exactly on the second edge.
> 
> This was in part historical from back when we were always
> tick based, but correcting this since has been avoided since
> it adds extra conditional checks in the gettime fastpath,
> which has performance overhead.
> 
> However, it was recently pointed out that ABS_TIME
> CLOCK_REALTIME timers set for right after the leapsecond
> could fire a second early, since some timers may be expired
> before we trigger the timekeeping timer, which then applies
> the leapsecond.
> 
> This isn't quite as bad as it sounds, since behaviorally
> it is similar to what is possible w/ ntpd made leapsecond
> adjustments done w/o using the kernel discipline. Where
> due to latencies, timers may fire just prior to the
> settimeofday call. (Also, one should note that all
> applications using CLOCK_REALTIME timers should always be
> careful, since they are prone to quirks from settimeofday()
> disturbances.)
> 
> However, the purpose of having the kernel do the leap adjustment
> is to avoid such latencies, so I think this is worth fixing.
> 
> So in order to properly keep those timers from firing a second
> early, this patch modifies the gettime accessors to do the
> extra checks to apply the leapsecond adjustment on the second
> edge. This prevents the timer core from expiring timers too
> early.
> 
> This patch does not handle VDSO time implementations, so
> userspace using vdso gettime will still see the leapsecond
> applied at the first timer tick after the leapsecond.
> This is a bit of a tradeoff, since the performance impact
> would be greatest to VDSO implementations, and since vdso
> interfaces don't provide the TIME_OOP flag, one can't
> distinquish the leapsecond from a time discontinuity (such
> as settimeofday), so correcting the VDSO may not be as
> important there.
> 
> Apologies to Richard Cochran, who pushed for such a change
> years ago, which I resisted due to the concerns about the
> performance overhead.
> 
> While I suspect this isn't extremely critical, folks who
> care about strict leap-second correctness will likely
> want to watch this, and it will likely be a -stable candidate.
> 
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jiri Bohac <jbohac@suse.cz>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Shuah Khan <shuahkh@osg.samsung.com>
> Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
> Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> Reported-by: Prarit Bhargava <prarit@redhat.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/time64.h              |  1 +
>  include/linux/timekeeper_internal.h |  7 +++
>  kernel/time/ntp.c                   | 73 +++++++++++++++++++++++++---
>  kernel/time/ntp_internal.h          |  1 +
>  kernel/time/timekeeping.c           | 97 ++++++++++++++++++++++++++++++++-----
>  5 files changed, 159 insertions(+), 20 deletions(-)

So I don't like the complexity of this at all: why do we add over 100 lines of 
code for something that occurs (literally) once in a blue moon?

... and for that reason I'm not surprised at all that it broke in non-obvious 
ways.

Instead of having these super rare special events, how about implementing leap 
second smearing instead? That's far less radical and a lot easier to test as well, 
as it's a continuous mechanism. It will also confuse user-space a lot less, 
because there are no sudden time jumps.

Secondly, why is there a directional flag? I thought leap seconds can only be 
inserted.

So all in one, the leap second code is fragile and complex - lets re-think the 
whole topic instead of complicating it even more ...

Thanks,

	Ingo