* [PATCH 1/3] timekeeping: Fix lost updates to tai adjustment
[not found] <1387308457-25364-1-git-send-email-john.stultz@linaro.org>
@ 2013-12-17 19:27 ` John Stultz
2013-12-17 19:27 ` [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change John Stultz
2013-12-17 19:27 ` [PATCH 3/3] timekeeping: Avoid possible deadlock from clock_was_set_delayed John Stultz
2 siblings, 0 replies; 8+ messages in thread
From: John Stultz @ 2013-12-17 19:27 UTC (permalink / raw)
To: LKML
Cc: John Stultz, Sasha Levin, Thomas Gleixner, Prarit Bhargava,
Richard Cochran, Ingo Molnar, stable
Since 48cdc135d4840 (Implement a shadow timekeeper), we have to
call timekeeping_update() after any adjustment to the timekeeping
structure in order to make sure that any adjustments to the structure
persist.
Unfortunately, the updates to the tai offset via adjtimex do not
trigger this update, causing adjustments to the tai offset to be
made and then over-written by the previous value at the next
update_wall_time() call.
This patch resovles the issue by calling timekeeping_update()
right after setting the tai offset.
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org> #3.10+
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
kernel/time/timekeeping.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 87b4f00..6bad3d9 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -610,6 +610,7 @@ void timekeeping_set_tai_offset(s32 tai_offset)
raw_spin_lock_irqsave(&timekeeper_lock, flags);
write_seqcount_begin(&timekeeper_seq);
__timekeeping_set_tai_offset(tk, tai_offset);
+ timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
write_seqcount_end(&timekeeper_seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
clock_was_set();
@@ -1698,7 +1699,7 @@ int do_adjtimex(struct timex *txc)
if (tai != orig_tai) {
__timekeeping_set_tai_offset(tk, tai);
- update_pvclock_gtod(tk, true);
+ timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
clock_was_set_delayed();
}
write_seqcount_end(&timekeeper_seq);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change
[not found] <1387308457-25364-1-git-send-email-john.stultz@linaro.org>
2013-12-17 19:27 ` [PATCH 1/3] timekeeping: Fix lost updates to tai adjustment John Stultz
@ 2013-12-17 19:27 ` John Stultz
2013-12-18 10:08 ` Ingo Molnar
2013-12-17 19:27 ` [PATCH 3/3] timekeeping: Avoid possible deadlock from clock_was_set_delayed John Stultz
2 siblings, 1 reply; 8+ messages in thread
From: John Stultz @ 2013-12-17 19:27 UTC (permalink / raw)
To: LKML
Cc: John Stultz, Sasha Levin, Thomas Gleixner, Ingo Molnar,
David Vrabel, Konrad Rzeszutek Wilk, Prarit Bhargava,
Richard Cochran, xen-devel, stable
In 780427f0e11 (Indicate that clock was set in the pvclock
gtod notifier), logic was added to pass a CLOCK_WAS_SET
notification to the pvclock notifier chain.
While that patch added a action flag returned from
accumulate_nsecs_to_secs(), it only uses the returned value
in one location, and not in the logarithmic accumulation.
This means if a leap second triggered during the logarithmic
accumulation (which is most likely where it would happen),
the notification that the clock was set would not make it to
the pv notifiers.
This patch extends the logarithmic_accumulation pass down
that action flag so proper notification will occur.
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: <xen-devel@lists.xen.org>
Cc: stable <stable@vger.kernel.org> #3.11+
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
kernel/time/timekeeping.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6bad3d9..998ec751 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1295,7 +1295,7 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
* Returns the unconsumed cycles.
*/
static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
- u32 shift)
+ u32 shift, unsigned int *action)
{
cycle_t interval = tk->cycle_interval << shift;
u64 raw_nsecs;
@@ -1309,7 +1309,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
tk->cycle_last += interval;
tk->xtime_nsec += tk->xtime_interval << shift;
- accumulate_nsecs_to_secs(tk);
+ *action |= accumulate_nsecs_to_secs(tk);
/* Accumulate raw time */
raw_nsecs = (u64)tk->raw_interval << shift;
@@ -1367,7 +1367,7 @@ static void update_wall_time(void)
struct timekeeper *tk = &shadow_timekeeper;
cycle_t offset;
int shift = 0, maxshift;
- unsigned int action;
+ unsigned int action = 0;
unsigned long flags;
raw_spin_lock_irqsave(&timekeeper_lock, flags);
@@ -1402,7 +1402,7 @@ static void update_wall_time(void)
maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1;
shift = min(shift, maxshift);
while (offset >= tk->cycle_interval) {
- offset = logarithmic_accumulation(tk, offset, shift);
+ offset = logarithmic_accumulation(tk, offset, shift, &action);
if (offset < tk->cycle_interval<<shift)
shift--;
}
@@ -1420,7 +1420,7 @@ static void update_wall_time(void)
* Finally, make sure that after the rounding
* xtime_nsec isn't larger than NSEC_PER_SEC
*/
- action = accumulate_nsecs_to_secs(tk);
+ action |= accumulate_nsecs_to_secs(tk);
write_seqcount_begin(&timekeeper_seq);
/* Update clock->cycle_last with the new value */
--
1.8.3.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] timekeeping: Avoid possible deadlock from clock_was_set_delayed
[not found] <1387308457-25364-1-git-send-email-john.stultz@linaro.org>
2013-12-17 19:27 ` [PATCH 1/3] timekeeping: Fix lost updates to tai adjustment John Stultz
2013-12-17 19:27 ` [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change John Stultz
@ 2013-12-17 19:27 ` John Stultz
2 siblings, 0 replies; 8+ messages in thread
From: John Stultz @ 2013-12-17 19:27 UTC (permalink / raw)
To: LKML
Cc: John Stultz, Thomas Gleixner, Prarit Bhargava, Richard Cochran,
Ingo Molnar, Sasha Levin, stable
As part of normal operaions, the hrtimer subsystem frequently calls
into the timekeeping code, creating a locking order of
hrtimer locks -> timekeeping locks
clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
between the timekeeping the hrtimer subsystem, so that we could
notify the hrtimer subsytem the time had changed while holding
the timekeeping locks. This was done by scheduling delayed work
that would run later once we were out of the timekeeing code.
But unfortunately the lock chains are complex enoguh that in
scheduling delayed work, we end up eventually trying to grab
an hrtimer lock.
Sasha Levin noticed this in testing when the new seqlock lockdep
enablement triggered the following (somewhat abrieviated) message:
[ 251.100221] ======================================================
[ 251.100221] [ INFO: possible circular locking dependency detected ]
[ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
[ 251.101967] -------------------------------------------------------
[ 251.101967] kworker/10:1/4506 is trying to acquire lock:
[ 251.101967] (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
[ 251.101967]
[ 251.101967] but task is already holding lock:
[ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[ 251.101967]
[ 251.101967] which lock already depends on the new lock.
[ 251.101967]
[ 251.101967]
[ 251.101967] the existing dependency chain (in reverse order) is:
[ 251.101967]
-> #5 (hrtimer_bases.lock#11){-.-...}:
[snipped]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[snipped]
-> #3 (&rq->lock){-.-.-.}:
[snipped]
-> #2 (&p->pi_lock){-.-.-.}:
[snipped]
-> #1 (&(&pool->lock)->rlock){-.-...}:
[ 251.101967] [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
[ 251.101967] [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
[ 251.101967] [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
[ 251.101967] [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
[ 251.101967] [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
[ 251.101967] [<ffffffff81154168>] queue_work_on+0x98/0x120
[ 251.101967] [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
[ 251.101967] [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
[ 251.101967] [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
[ 251.101967] [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
[ 251.101967]
-> #0 (timekeeper_seq){----..}:
[snipped]
[ 251.101967] other info that might help us debug this:
[ 251.101967]
[ 251.101967] Chain exists of:
timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
[ 251.101967] Possible unsafe locking scenario:
[ 251.101967]
[ 251.101967] CPU0 CPU1
[ 251.101967] ---- ----
[ 251.101967] lock(hrtimer_bases.lock#11);
[ 251.101967] lock(&rt_b->rt_runtime_lock);
[ 251.101967] lock(hrtimer_bases.lock#11);
[ 251.101967] lock(timekeeper_seq);
[ 251.101967]
[ 251.101967] *** DEADLOCK ***
[ 251.101967]
[ 251.101967] 3 locks held by kworker/10:1/4506:
[ 251.101967] #0: (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[ 251.101967] #1: (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[ 251.101967]
[ 251.101967] stack backtrace:
[ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
[ 251.101967] Workqueue: events clock_was_set_work
So the best solution is to avoid calling clock_was_set_delayed() while
holding the timekeeping lock, and instead using a flag variable to
decide if we should call clock_was_set() once we've released the locks.
This works for the case here, where the do_adjtimex() was the deadlock
trigger point. Unfortuantely, in update_wall_time() we still hold
the jiffies lock, which would deadlock with the ipi triggered by
clock_was_set(), preventing us from calling it even after we drop the
timekeeping lock. So instead call clock_was_set_delayed() at that point.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: stable <stable@vger.kernel.org> #3.10+
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
kernel/time/timekeeping.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 998ec751..c1d36b6 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1278,7 +1278,6 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
- clock_was_set_delayed();
action = TK_CLOCK_WAS_SET;
}
}
@@ -1440,6 +1439,19 @@ static void update_wall_time(void)
write_seqcount_end(&timekeeper_seq);
out:
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+ if (action & TK_CLOCK_WAS_SET) {
+ /*
+ * XXX - I'd rather we just call clock_was_set(), but
+ * since we're currently holding the jiffies lock, calling
+ * clock_was_set would trigger an ipi which would then grab
+ * the jiffies lock and we'd deadlock. :(
+ * The right solution should probably be droping
+ * the jiffies lock before calling update_wall_time
+ * but that requires some rework of the tick sched
+ * code.
+ */
+ clock_was_set_delayed();
+ }
}
/**
@@ -1700,11 +1712,13 @@ int do_adjtimex(struct timex *txc)
if (tai != orig_tai) {
__timekeeping_set_tai_offset(tk, tai);
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
- clock_was_set_delayed();
}
write_seqcount_end(&timekeeper_seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+ if (tai != orig_tai)
+ clock_was_set();
+
ntp_notify_cmos_timer();
return ret;
--
1.8.3.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change
2013-12-17 19:27 ` [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change John Stultz
@ 2013-12-18 10:08 ` Ingo Molnar
2013-12-18 18:44 ` John Stultz
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2013-12-18 10:08 UTC (permalink / raw)
To: John Stultz
Cc: LKML, Sasha Levin, Thomas Gleixner, David Vrabel,
Konrad Rzeszutek Wilk, Prarit Bhargava, Richard Cochran,
xen-devel, stable
* John Stultz <john.stultz@linaro.org> wrote:
> In 780427f0e11 (Indicate that clock was set in the pvclock
> gtod notifier), logic was added to pass a CLOCK_WAS_SET
> notification to the pvclock notifier chain.
>
> While that patch added a action flag returned from
> accumulate_nsecs_to_secs(), it only uses the returned value
> in one location, and not in the logarithmic accumulation.
>
> This means if a leap second triggered during the logarithmic
> accumulation (which is most likely where it would happen),
> the notification that the clock was set would not make it to
> the pv notifiers.
>
> This patch extends the logarithmic_accumulation pass down
> that action flag so proper notification will occur.
>
> Cc: Sasha Levin <sasha.levin@oracle.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: <xen-devel@lists.xen.org>
> Cc: stable <stable@vger.kernel.org> #3.11+
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
> kernel/time/timekeeping.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 6bad3d9..998ec751 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1295,7 +1295,7 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
> * Returns the unconsumed cycles.
> */
> static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
> - u32 shift)
> + u32 shift, unsigned int *action)
I have two complaints about this patch:
1)
I think the 'action' name sucks because it's too obfuscated. It's only
ever set to TK_CLOCK_WAS_SET, so why not name it more descriptively,
i.e. 'clock_was_set'?
2)
Secondly, the proliferation of parameters passed around I think calls
for a helper structure which would carry the (offset, shift,
clock_was_set) triple:
struct acc_params {
cycle_t offset;
u32 shift;
bool clock_was_set;
};
And then passed down like this:
> static cycle_t logarithmic_accumulation(struct timekeeper *tk, struct acc_params *params)
Agreed?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change
2013-12-18 10:08 ` Ingo Molnar
@ 2013-12-18 18:44 ` John Stultz
2013-12-18 19:14 ` John Stultz
2013-12-19 12:48 ` Ingo Molnar
0 siblings, 2 replies; 8+ messages in thread
From: John Stultz @ 2013-12-18 18:44 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Sasha Levin, Thomas Gleixner, David Vrabel,
Konrad Rzeszutek Wilk, Prarit Bhargava, Richard Cochran,
xen-devel, stable
On 12/18/2013 02:08 AM, Ingo Molnar wrote:
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> In 780427f0e11 (Indicate that clock was set in the pvclock
>> gtod notifier), logic was added to pass a CLOCK_WAS_SET
>> notification to the pvclock notifier chain.
>>
>> While that patch added a action flag returned from
>> accumulate_nsecs_to_secs(), it only uses the returned value
>> in one location, and not in the logarithmic accumulation.
>>
>> This means if a leap second triggered during the logarithmic
>> accumulation (which is most likely where it would happen),
>> the notification that the clock was set would not make it to
>> the pv notifiers.
>>
>> This patch extends the logarithmic_accumulation pass down
>> that action flag so proper notification will occur.
>>
>> Cc: Sasha Levin <sasha.levin@oracle.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: David Vrabel <david.vrabel@citrix.com>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Cc: Prarit Bhargava <prarit@redhat.com>
>> Cc: Richard Cochran <richardcochran@gmail.com>
>> Cc: <xen-devel@lists.xen.org>
>> Cc: stable <stable@vger.kernel.org> #3.11+
>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>> ---
>> kernel/time/timekeeping.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> index 6bad3d9..998ec751 100644
>> --- a/kernel/time/timekeeping.c
>> +++ b/kernel/time/timekeeping.c
>> @@ -1295,7 +1295,7 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
>> * Returns the unconsumed cycles.
>> */
>> static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
>> - u32 shift)
>> + u32 shift, unsigned int *action)
> I have two complaints about this patch:
>
> 1)
>
> I think the 'action' name sucks because it's too obfuscated. It's only
> ever set to TK_CLOCK_WAS_SET, so why not name it more descriptively,
> i.e. 'clock_was_set'?
Sure, I was reusing the existing variables, but no issue changing the
name here too.
> 2)
>
> Secondly, the proliferation of parameters passed around I think calls
> for a helper structure which would carry the (offset, shift,
> clock_was_set) triple:
>
> struct acc_params {
> cycle_t offset;
> u32 shift;
> bool clock_was_set;
> };
>
> And then passed down like this:
>
>> static cycle_t logarithmic_accumulation(struct timekeeper *tk, struct acc_params *params)
> Agreed?
Huh. Ok, I don't see the parameters structure likely being reused, so
this would be a special struct only for the logarithmic_accumulation() call?
Also, since we want to pass down TK_CLOCK_WAS_SET to timekeeping_update,
you ok with clock_was_set being an int instead of a bool?
thanks
-john
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change
2013-12-18 18:44 ` John Stultz
@ 2013-12-18 19:14 ` John Stultz
2013-12-19 12:48 ` Ingo Molnar
1 sibling, 0 replies; 8+ messages in thread
From: John Stultz @ 2013-12-18 19:14 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Sasha Levin, Thomas Gleixner, David Vrabel,
Konrad Rzeszutek Wilk, Prarit Bhargava, Richard Cochran,
xen-devel, stable
On 12/18/2013 10:44 AM, John Stultz wrote:
> On 12/18/2013 02:08 AM, Ingo Molnar wrote:
>> * John Stultz <john.stultz@linaro.org> wrote:
>>
>>> In 780427f0e11 (Indicate that clock was set in the pvclock
>>> gtod notifier), logic was added to pass a CLOCK_WAS_SET
>>> notification to the pvclock notifier chain.
>>>
>>> While that patch added a action flag returned from
>>> accumulate_nsecs_to_secs(), it only uses the returned value
>>> in one location, and not in the logarithmic accumulation.
>>>
>>> This means if a leap second triggered during the logarithmic
>>> accumulation (which is most likely where it would happen),
>>> the notification that the clock was set would not make it to
>>> the pv notifiers.
>>>
>>> This patch extends the logarithmic_accumulation pass down
>>> that action flag so proper notification will occur.
>>>
>>> Cc: Sasha Levin <sasha.levin@oracle.com>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: David Vrabel <david.vrabel@citrix.com>
>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> Cc: Prarit Bhargava <prarit@redhat.com>
>>> Cc: Richard Cochran <richardcochran@gmail.com>
>>> Cc: <xen-devel@lists.xen.org>
>>> Cc: stable <stable@vger.kernel.org> #3.11+
>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>> ---
>>> kernel/time/timekeeping.c | 10 +++++-----
>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>>> index 6bad3d9..998ec751 100644
>>> --- a/kernel/time/timekeeping.c
>>> +++ b/kernel/time/timekeeping.c
>>> @@ -1295,7 +1295,7 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
>>> * Returns the unconsumed cycles.
>>> */
>>> static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
>>> - u32 shift)
>>> + u32 shift, unsigned int *action)
>> I have two complaints about this patch:
>>
>> 1)
>>
>> I think the 'action' name sucks because it's too obfuscated. It's only
>> ever set to TK_CLOCK_WAS_SET, so why not name it more descriptively,
>> i.e. 'clock_was_set'?
> Sure, I was reusing the existing variables, but no issue changing the
> name here too.
>
>
>> 2)
>>
>> Secondly, the proliferation of parameters passed around I think calls
>> for a helper structure which would carry the (offset, shift,
>> clock_was_set) triple:
>>
>> struct acc_params {
>> cycle_t offset;
>> u32 shift;
>> bool clock_was_set;
>> };
>>
>> And then passed down like this:
>>
>>> static cycle_t logarithmic_accumulation(struct timekeeper *tk, struct acc_params *params)
>> Agreed?
> Huh. Ok, I don't see the parameters structure likely being reused, so
> this would be a special struct only for the logarithmic_accumulation() call?
The other downside to this is it really doesn't improve the read-ability
of the update_wall_clock function, since
we end either end up changing all the offset, shift, action references
to param.* (which besides being a bit ugly, feels like way too much
change for an urgent patch). The alternative is to only pack the
structure right before we call which is ok, but seems like extra overhead.
Also passing the structure to the function makes it a little more
difficult to understand, as you lose the clarity of which values are
input-only and which values are modified or set by the function.
I do agree that some of recent changes (particularly the
shadow-timekeeping logic) has caused the code to feel a bit hairy and
has caused a number of these warts where we're passing extra arguments.
I've not had the time to take a deep look and see how things could be
re-factored to be a bit cleaner. But the code definitely needs some cleanup.
I'll re-factor the action->clock_was_set bit for this (urgent) patch
series, but I may want to push back on trying to do the logarithmic
accumulation parameters structure change for 3.14.
thanks
-john
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change
2013-12-18 18:44 ` John Stultz
2013-12-18 19:14 ` John Stultz
@ 2013-12-19 12:48 ` Ingo Molnar
1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2013-12-19 12:48 UTC (permalink / raw)
To: John Stultz
Cc: LKML, Sasha Levin, Thomas Gleixner, David Vrabel,
Konrad Rzeszutek Wilk, Prarit Bhargava, Richard Cochran,
xen-devel, stable
* John Stultz <john.stultz@linaro.org> wrote:
> On 12/18/2013 02:08 AM, Ingo Molnar wrote:
> > * John Stultz <john.stultz@linaro.org> wrote:
> >
> >> In 780427f0e11 (Indicate that clock was set in the pvclock
> >> gtod notifier), logic was added to pass a CLOCK_WAS_SET
> >> notification to the pvclock notifier chain.
> >>
> >> While that patch added a action flag returned from
> >> accumulate_nsecs_to_secs(), it only uses the returned value
> >> in one location, and not in the logarithmic accumulation.
> >>
> >> This means if a leap second triggered during the logarithmic
> >> accumulation (which is most likely where it would happen),
> >> the notification that the clock was set would not make it to
> >> the pv notifiers.
> >>
> >> This patch extends the logarithmic_accumulation pass down
> >> that action flag so proper notification will occur.
> >>
> >> Cc: Sasha Levin <sasha.levin@oracle.com>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Ingo Molnar <mingo@kernel.org>
> >> Cc: David Vrabel <david.vrabel@citrix.com>
> >> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> Cc: Prarit Bhargava <prarit@redhat.com>
> >> Cc: Richard Cochran <richardcochran@gmail.com>
> >> Cc: <xen-devel@lists.xen.org>
> >> Cc: stable <stable@vger.kernel.org> #3.11+
> >> Signed-off-by: John Stultz <john.stultz@linaro.org>
> >> ---
> >> kernel/time/timekeeping.c | 10 +++++-----
> >> 1 file changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >> index 6bad3d9..998ec751 100644
> >> --- a/kernel/time/timekeeping.c
> >> +++ b/kernel/time/timekeeping.c
> >> @@ -1295,7 +1295,7 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
> >> * Returns the unconsumed cycles.
> >> */
> >> static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
> >> - u32 shift)
> >> + u32 shift, unsigned int *action)
> > I have two complaints about this patch:
> >
> > 1)
> >
> > I think the 'action' name sucks because it's too obfuscated. It's only
> > ever set to TK_CLOCK_WAS_SET, so why not name it more descriptively,
> > i.e. 'clock_was_set'?
>
> Sure, I was reusing the existing variables, but no issue changing
> the name here too.
>
> > 2)
> >
> > Secondly, the proliferation of parameters passed around I think calls
> > for a helper structure which would carry the (offset, shift,
> > clock_was_set) triple:
> >
> > struct acc_params {
> > cycle_t offset;
> > u32 shift;
> > bool clock_was_set;
> > };
> >
> > And then passed down like this:
> >
> >> static cycle_t logarithmic_accumulation(struct timekeeper *tk, struct acc_params *params)
> > Agreed?
>
> Huh. Ok, I don't see the parameters structure likely being reused,
> so this would be a special struct only for the
> logarithmic_accumulation() call?
Yeah. If you think that's overkill then I'm fine with your original
result-pointer approach as well, as long as it's named talkatively.
> Also, since we want to pass down TK_CLOCK_WAS_SET to
> timekeeping_update, you ok with clock_was_set being an int instead
> of a bool?
Sure - I didn't notice that detail.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change
[not found] <1387565986-11738-1-git-send-email-john.stultz@linaro.org>
@ 2013-12-20 18:59 ` John Stultz
0 siblings, 0 replies; 8+ messages in thread
From: John Stultz @ 2013-12-20 18:59 UTC (permalink / raw)
To: LKML
Cc: John Stultz, Sasha Levin, Thomas Gleixner, Ingo Molnar,
David Vrabel, Konrad Rzeszutek Wilk, Prarit Bhargava,
Richard Cochran, xen-devel, stable
In 780427f0e11 (Indicate that clock was set in the pvclock
gtod notifier), logic was added to pass a CLOCK_WAS_SET
notification to the pvclock notifier chain.
While that patch added a action flag returned from
accumulate_nsecs_to_secs(), it only uses the returned value
in one location, and not in the logarithmic accumulation.
This means if a leap second triggered during the logarithmic
accumulation (which is most likely where it would happen),
the notification that the clock was set would not make it to
the pv notifiers.
This patch extends the logarithmic_accumulation pass down
that action flag so proper notification will occur.
This patch also changes the varialbe action -> clock_was_set
per Ingo's suggestion.
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: <xen-devel@lists.xen.org>
Cc: stable <stable@vger.kernel.org> #3.11+
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
kernel/time/timekeeping.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6bad3d9..d4c78f0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1256,7 +1256,7 @@ out_adjust:
static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
{
u64 nsecps = (u64)NSEC_PER_SEC << tk->shift;
- unsigned int action = 0;
+ unsigned int clock_was_set = 0;
while (tk->xtime_nsec >= nsecps) {
int leap;
@@ -1279,10 +1279,10 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
clock_was_set_delayed();
- action = TK_CLOCK_WAS_SET;
+ clock_was_set = TK_CLOCK_WAS_SET;
}
}
- return action;
+ return clock_was_set;
}
/**
@@ -1295,7 +1295,8 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
* Returns the unconsumed cycles.
*/
static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
- u32 shift)
+ u32 shift,
+ unsigned int *clock_was_set)
{
cycle_t interval = tk->cycle_interval << shift;
u64 raw_nsecs;
@@ -1309,7 +1310,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
tk->cycle_last += interval;
tk->xtime_nsec += tk->xtime_interval << shift;
- accumulate_nsecs_to_secs(tk);
+ *clock_was_set |= accumulate_nsecs_to_secs(tk);
/* Accumulate raw time */
raw_nsecs = (u64)tk->raw_interval << shift;
@@ -1367,7 +1368,7 @@ static void update_wall_time(void)
struct timekeeper *tk = &shadow_timekeeper;
cycle_t offset;
int shift = 0, maxshift;
- unsigned int action;
+ unsigned int clock_was_set = 0;
unsigned long flags;
raw_spin_lock_irqsave(&timekeeper_lock, flags);
@@ -1402,7 +1403,8 @@ static void update_wall_time(void)
maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1;
shift = min(shift, maxshift);
while (offset >= tk->cycle_interval) {
- offset = logarithmic_accumulation(tk, offset, shift);
+ offset = logarithmic_accumulation(tk, offset, shift,
+ &clock_was_set);
if (offset < tk->cycle_interval<<shift)
shift--;
}
@@ -1420,7 +1422,7 @@ static void update_wall_time(void)
* Finally, make sure that after the rounding
* xtime_nsec isn't larger than NSEC_PER_SEC
*/
- action = accumulate_nsecs_to_secs(tk);
+ clock_was_set |= accumulate_nsecs_to_secs(tk);
write_seqcount_begin(&timekeeper_seq);
/* Update clock->cycle_last with the new value */
@@ -1436,7 +1438,7 @@ static void update_wall_time(void)
* updating.
*/
memcpy(real_tk, tk, sizeof(*tk));
- timekeeping_update(real_tk, action);
+ timekeeping_update(real_tk, clock_was_set);
write_seqcount_end(&timekeeper_seq);
out:
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-12-20 18:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1387308457-25364-1-git-send-email-john.stultz@linaro.org>
2013-12-17 19:27 ` [PATCH 1/3] timekeeping: Fix lost updates to tai adjustment John Stultz
2013-12-17 19:27 ` [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change John Stultz
2013-12-18 10:08 ` Ingo Molnar
2013-12-18 18:44 ` John Stultz
2013-12-18 19:14 ` John Stultz
2013-12-19 12:48 ` Ingo Molnar
2013-12-17 19:27 ` [PATCH 3/3] timekeeping: Avoid possible deadlock from clock_was_set_delayed John Stultz
[not found] <1387565986-11738-1-git-send-email-john.stultz@linaro.org>
2013-12-20 18:59 ` [PATCH 2/3] timekeeping: Fix potential lost pv notification of time change John Stultz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).