[PATCH] nohz: fix race allowing use of stale jiffies when waking

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nohz: fix race allowing use of stale jiffies when waking
@ 2012-01-12  8:55 Milton Miller
  2012-01-12  9:49 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Milton Miller @ 2012-01-12  8:55 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz; +Cc: linux-kernel, Paul E. McKenney

When waking up from nohz mode, all cpus call tick_do_update_jiffies64
regardless of tick_do_timer_cpu as it could be no cpu was assigned.

At the start of the function there is a quick lockless check to
determine if jiffies is current.  The check uses last_jiffies_update,
which is used to calculate when to perform the next increment.
Unfortunately it is updated when how many jiffies to advance the
clock is calculated, before the call to do_timer which actually
updates jiffies.  A second cpu waking up could use the (potentially
very) stale jiffies value during this window.

This patch changes the check to be against tick_next_period, which
is updated after the call to do_timer completes.  It compares the
result of subtraction to zero, but this is safe as ktime_sub returns
ktime_t which is s64, as signed type.

I found this race while trying to track down reports of network adapter
hangs on a large system.  I suspected premature false detection so
I added logging when the locked region determined a multiple jiffie
update would be required.  I noticed that it happened frequently when
tick_do_timer_cpu was NONE (-1), and realized the large update was
when all cpus were previously in nohz.  I then thought about what
would happen if multiple cpus woke up near close to each other in
time and decided the stale jiffies would be used.  (I later found at
least part of the hung adapter reports were due to faulty detection
logic that has since changed upstream.)

Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: stable@vger.kernel.org
--- 
Patch was generated and tested against 2.6.36; I verified it applies
with offset -1 line to next-20120111.

Index: src/kernel/time/tick-sched.c
===================================================================
--- src.orig/kernel/time/tick-sched.c	2011-10-13 17:42:16.000000000 -0500
+++ src/kernel/time/tick-sched.c	2011-10-13 17:45:31.000000000 -0500
@@ -52,8 +52,8 @@ static void tick_do_update_jiffies64(kti
 	/*
 	 * Do a quick check without holding xtime_lock:
 	 */
-	delta = ktime_sub(now, last_jiffies_update);
-	if (delta.tv64 < tick_period.tv64)
+	delta = ktime_sub(now, tick_next_period);
+	if (delta.tv64 < 0)
 		return;

 	/* Reevalute with xtime_lock held */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nohz: fix race allowing use of stale jiffies when waking
  2012-01-12  8:55 [PATCH] nohz: fix race allowing use of stale jiffies when waking Milton Miller
@ 2012-01-12  9:49 ` Eric Dumazet
  2012-01-14  5:02   ` Milton Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2012-01-12  9:49 UTC (permalink / raw)
  To: Milton Miller
  Cc: Thomas Gleixner, John Stultz, linux-kernel, Paul E. McKenney

Le jeudi 12 janvier 2012 à 02:55 -0600, Milton Miller a écrit :
> When waking up from nohz mode, all cpus call tick_do_update_jiffies64
> regardless of tick_do_timer_cpu as it could be no cpu was assigned.
> 
> At the start of the function there is a quick lockless check to
> determine if jiffies is current.  The check uses last_jiffies_update,
> which is used to calculate when to perform the next increment.
> Unfortunately it is updated when how many jiffies to advance the
> clock is calculated, before the call to do_timer which actually
> updates jiffies.  A second cpu waking up could use the (potentially
> very) stale jiffies value during this window.
> 
> This patch changes the check to be against tick_next_period, which
> is updated after the call to do_timer completes.  It compares the
> result of subtraction to zero, but this is safe as ktime_sub returns
> ktime_t which is s64, as signed type.
> 
> I found this race while trying to track down reports of network adapter
> hangs on a large system.  I suspected premature false detection so
> I added logging when the locked region determined a multiple jiffie
> update would be required.  I noticed that it happened frequently when
> tick_do_timer_cpu was NONE (-1), and realized the large update was
> when all cpus were previously in nohz.  I then thought about what
> would happen if multiple cpus woke up near close to each other in
> time and decided the stale jiffies would be used.  (I later found at
> least part of the hung adapter reports were due to faulty detection
> logic that has since changed upstream.)
> 
> Signed-off-by: Milton Miller <miltonm@bga.com>
> Cc: stable@vger.kernel.org
> --- 
> Patch was generated and tested against 2.6.36; I verified it applies
> with offset -1 line to next-20120111.
> 
> Index: src/kernel/time/tick-sched.c
> ===================================================================
> --- src.orig/kernel/time/tick-sched.c	2011-10-13 17:42:16.000000000 -0500
> +++ src/kernel/time/tick-sched.c	2011-10-13 17:45:31.000000000 -0500
> @@ -52,8 +52,8 @@ static void tick_do_update_jiffies64(kti
>  	/*
>  	 * Do a quick check without holding xtime_lock:
>  	 */
> -	delta = ktime_sub(now, last_jiffies_update);
> -	if (delta.tv64 < tick_period.tv64)
> +	delta = ktime_sub(now, tick_next_period);
> +	if (delta.tv64 < 0)
>  		return;
>  

Given ktime_t on 32bit arches is not an atomic type, I wonder how safe
is this anyway...




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nohz: fix race allowing use of stale jiffies when waking
  2012-01-12  9:49 ` Eric Dumazet
@ 2012-01-14  5:02   ` Milton Miller
  2012-03-22  1:14     ` John Stultz
  0 siblings, 1 reply; 4+ messages in thread
From: Milton Miller @ 2012-01-14  5:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Gleixner, John Stultz, linux-kernel, Paul E. McKenney

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4778 bytes --]

On Thu, 12 Jan 2012 about 10:49:15 +0100 Eric Dumazet wrote:
> Le jeudi 12 janvier 2012 à 02:55 -0600, Milton Miller a écrit :
> > When waking up from nohz mode, all cpus call tick_do_update_jiffies64
> > regardless of tick_do_timer_cpu as it could be no cpu was assigned.
> > 
> > At the start of the function there is a quick lockless check to
> > determine if jiffies is current.  The check uses last_jiffies_update,
> > which is used to calculate when to perform the next increment.
> > Unfortunately it is updated when how many jiffies to advance the
> > clock is calculated, before the call to do_timer which actually
> > updates jiffies.  A second cpu waking up could use the (potentially
> > very) stale jiffies value during this window.
> > 
> > This patch changes the check to be against tick_next_period, which
> > is updated after the call to do_timer completes.  It compares the
> > result of subtraction to zero, but this is safe as ktime_sub returns
> > ktime_t which is s64, as signed type.
> > 
> > I found this race while trying to track down reports of network adapter
> > hangs on a large system.  I suspected premature false detection so
> > I added logging when the locked region determined a multiple jiffie
> > update would be required.  I noticed that it happened frequently when
> > tick_do_timer_cpu was NONE (-1), and realized the large update was
> > when all cpus were previously in nohz.  I then thought about what
> > would happen if multiple cpus woke up near close to each other in
> > time and decided the stale jiffies would be used.  (I later found at
> > least part of the hung adapter reports were due to faulty detection
> > logic that has since changed upstream.)
> > 
> > Signed-off-by: Milton Miller <miltonm@bga.com>
> > Cc: stable@vger.kernel.org
> > --- 
> > Patch was generated and tested against 2.6.36; I verified it applies
> > with offset -1 line to next-20120111.
> > 
> > Index: src/kernel/time/tick-sched.c
> > ===================================================================
> > --- src.orig/kernel/time/tick-sched.c	2011-10-13 17:42:16.000000000 -0500
> > +++ src/kernel/time/tick-sched.c	2011-10-13 17:45:31.000000000 -0500
> > @@ -52,8 +52,8 @@ static void tick_do_update_jiffies64(kti
> >  	/*
> >  	 * Do a quick check without holding xtime_lock:
> >  	 */
> > -	delta = ktime_sub(now, last_jiffies_update);
> > -	if (delta.tv64 < tick_period.tv64)
> > +	delta = ktime_sub(now, tick_next_period);
> > +	if (delta.tv64 < 0)
> >  		return;
> >  
> 
> Given ktime_t on 32bit arches is not an atomic type, I wonder how safe
> is this anyway...
> 

Ok I admit I hadn't thought about it, and initially I was going to
think of something involving comparing the two timestamps, and
waiting if next_period <= next_jiffies_update (with approprate
subtract and compare).

But then I thought some more and comparing the timestamp after the
update is safe:

case 1) We see neither half.   We compare now to old value, and
decide to take the lock and check again.

case 2) We see the new value.  We compare and decide we don't need
to take the lock.  Big win.

case 3) We see the the lower part is updated to a smaller
value but the upper part is still the old value.  The time
for update seems to be in the past and we wait take thee lock
and check again.

case 4) We see a partial update.  The upper half is a new larger
value but the lower half is the old, higher value.   In this case we
think the jiffy will be valid further into the future than we think
it should be and skip waiting for the lock.  This state is usually
quite transitory and we just got done updating jiffies, so its not
likely that jiffies should actually need an update.

I use weasel words here because there is a window where the update
has been half performed and the cpu doing the update stalls (eg
gets time sliced out by its hypervisor).  In this small window
we could continue to use the updated jiffies beyond its expiration
time instead of waiting for the updating cpu to finsih storing the
new expiration time and release the lock.

There are a couple additional points to consider in this scenerio.
One is that the cpu still has xtime lock so any attempt to read a
high precision time will stall.  The second is if the cpu updating
the jiffies is stalled by the hypervisor, then it is not unique to
when it is waking from nohz and is likely happing when it owns
timer duty, so time will be subject to bunching and jumping jiffies
on a regular baasis.  About the most we could do is detect it, either
by taking periodic helath checks of jiffie by other cpus or noticing
that our tick update is constantly behind.

So I think the updated racy check is fine, but will expand on the
racy check comment why it is safe if that is desired.

milton

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nohz: fix race allowing use of stale jiffies when waking
  2012-01-14  5:02   ` Milton Miller
@ 2012-03-22  1:14     ` John Stultz
  0 siblings, 0 replies; 4+ messages in thread
From: John Stultz @ 2012-03-22  1:14 UTC (permalink / raw)
  To: Milton Miller
  Cc: Eric Dumazet, Thomas Gleixner, linux-kernel, Paul E. McKenney

On 01/13/2012 09:02 PM, Milton Miller wrote:
> On Thu, 12 Jan 2012 about 10:49:15 +0100 Eric Dumazet wrote:
>> Le jeudi 12 janvier 2012 Ã  02:55 -0600, Milton Miller a Ã©crit :
>>> When waking up from nohz mode, all cpus call tick_do_update_jiffies64
>>> regardless of tick_do_timer_cpu as it could be no cpu was assigned.
>>>
>>> At the start of the function there is a quick lockless check to
>>> determine if jiffies is current.  The check uses last_jiffies_update,
>>> which is used to calculate when to perform the next increment.
>>> Unfortunately it is updated when how many jiffies to advance the
>>> clock is calculated, before the call to do_timer which actually
>>> updates jiffies.  A second cpu waking up could use the (potentially
>>> very) stale jiffies value during this window.
>>>
>>> This patch changes the check to be against tick_next_period, which
>>> is updated after the call to do_timer completes.  It compares the
>>> result of subtraction to zero, but this is safe as ktime_sub returns
>>> ktime_t which is s64, as signed type.
>>>
>>> I found this race while trying to track down reports of network adapter
>>> hangs on a large system.  I suspected premature false detection so
>>> I added logging when the locked region determined a multiple jiffie
>>> update would be required.  I noticed that it happened frequently when
>>> tick_do_timer_cpu was NONE (-1), and realized the large update was
>>> when all cpus were previously in nohz.  I then thought about what
>>> would happen if multiple cpus woke up near close to each other in
>>> time and decided the stale jiffies would be used.  (I later found at
>>> least part of the hung adapter reports were due to faulty detection
>>> logic that has since changed upstream.)
>>>
>>> Signed-off-by: Milton Miller<miltonm@bga.com>
>>> Cc: stable@vger.kernel.org
>>> ---
>>> Patch was generated and tested against 2.6.36; I verified it applies
>>> with offset -1 line to next-20120111.
>>>
>>> Index: src/kernel/time/tick-sched.c
>>> ===================================================================
>>> --- src.orig/kernel/time/tick-sched.c	2011-10-13 17:42:16.000000000 -0500
>>> +++ src/kernel/time/tick-sched.c	2011-10-13 17:45:31.000000000 -0500
>>> @@ -52,8 +52,8 @@ static void tick_do_update_jiffies64(kti
>>>   	/*
>>>   	 * Do a quick check without holding xtime_lock:
>>>   	 */
>>> -	delta = ktime_sub(now, last_jiffies_update);
>>> -	if (delta.tv64<  tick_period.tv64)
>>> +	delta = ktime_sub(now, tick_next_period);
>>> +	if (delta.tv64<  0)
>>>   		return;
>>>
>> Given ktime_t on 32bit arches is not an atomic type, I wonder how safe
>> is this anyway...
>>
> Ok I admit I hadn't thought about it, and initially I was going to
> think of something involving comparing the two timestamps, and
> waiting if next_period<= next_jiffies_update (with approprate
> subtract and compare).
>
> But then I thought some more and comparing the timestamp after the
> update is safe:
[snipped]


> There are a couple additional points to consider in this scenerio.
> One is that the cpu still has xtime lock so any attempt to read a
> high precision time will stall.  The second is if the cpu updating
> the jiffies is stalled by the hypervisor, then it is not unique to
> when it is waking from nohz and is likely happing when it owns
> timer duty, so time will be subject to bunching and jumping jiffies
> on a regular baasis.  About the most we could do is detect it, either
> by taking periodic helath checks of jiffie by other cpus or noticing
> that our tick update is constantly behind.
>
> So I think the updated racy check is fine, but will expand on the
> racy check comment why it is safe if that is desired.
>
So,  what happened with this patch?   Is there a updated version with 
the improved documentation covered in this mail?

thanks
-john


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-03-22  1:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-12  8:55 [PATCH] nohz: fix race allowing use of stale jiffies when waking Milton Miller
2012-01-12  9:49 ` Eric Dumazet
2012-01-14  5:02   ` Milton Miller
2012-03-22  1:14     ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox