All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: dzickus@redhat.com, sfr@canb.auug.org.au, linuxarm@huawei.com,
	Nicholas Piggin <npiggin@gmail.com>,
	abdhalee@linux.vnet.ibm.com, sparclinux@vger.kernel.org,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	David Miller <davem@davemloft.net>,
	linux-arm-kernel@lists.infradead.org, tglx@linutronix.de
Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Date: Tue, 15 Aug 2017 08:47:43 -0700	[thread overview]
Message-ID: <20170815154743.GK7017@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170802172555.0000468a@huawei.com>

On Wed, Aug 02, 2017 at 05:25:55PM +0100, Jonathan Cameron wrote:
> On Tue, 1 Aug 2017 11:46:46 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Mon, Jul 31, 2017 at 04:27:57PM +0100, Jonathan Cameron wrote:
> > > On Mon, 31 Jul 2017 08:04:11 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > On Mon, Jul 31, 2017 at 12:08:47PM +0100, Jonathan Cameron wrote:  
> > > > > On Fri, 28 Jul 2017 12:03:50 -0700
> > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > >     
> > > > > > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote:    
> > > > > > > On Fri, 28 Jul 2017 09:55:29 -0700
> > > > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > >       
> > > > > > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote:      
> > > > > > > > > On Fri, 28 Jul 2017 08:44:11 +0100
> > > > > > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:        
> > > > > > > > 
> > > > > > > > [ . . . ]
> > > > > > > >       
> > > > > > > > > Ok.  Some info.  I disabled a few driver (usb and SAS) in the interest of having
> > > > > > > > > fewer timer events.  Issue became much easier to trigger (on some runs before
> > > > > > > > > I could get tracing up and running)
> > > > > > > > >e
> > > > > > > > > So logs are large enough that pastebin doesn't like them - please shoet if        
> > > > > > > > >>e another timer period is of interest.        
> > > > > > > > > 
> > > > > > > > > https://pastebin.com/iUZDfQGM for the timer trace.
> > > > > > > > > https://pastebin.com/3w1F7amH for dmesg.  
> > > > > > > > > 
> > > > > > > > > The relevant timeout on the RCU stall detector was 8 seconds.  Event is
> > > > > > > > > detected around 835.
> > > > > > > > > 
> > > > > > > > > It's a lot of logs, so I haven't identified a smoking gun yet but there
> > > > > > > > > may well be one in there.        
> > > > > > > > 
> > > > > > > > The dmesg says:
> > > > > > > > 
> > > > > > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> > > > > > > > 
> > > > > > > > So I look for "rcu_preempt" timer events and find these:
> > > > > > > > 
> > > > > > > > rcu_preempt-9     [019] ....   827.579114: timer_init: timer=ffff8017d5fc7da0
> > > > > > > > rcu_preempt-9     [019] d..1   827.579115: timer_start: timer=ffff8017d5fc7da0 function=process_timeout 
> > > > > > > > 
> > > > > > > > Next look for "ffff8017d5fc7da0" and I don't find anything else.      
> > > > > > > It does show up off the bottom of what would fit in pastebin...
> > > > > > > 
> > > > > > >      rcu_preempt-9     [001] d..1   837.681077: timer_cancel: timer=ffff8017d5fc7da0
> > > > > > >      rcu_preempt-9     [001] ....   837.681086: timer_init: timer=ffff8017d5fc7da0
> > > > > > >      rcu_preempt-9     [001] d..1   837.681087: timer_start: timer=ffff8017d5fc7da0 function=process_timeout expires=4295101298 [timeout=1] cpu=1 idx=0 flags=      
> > > > > > 
> > > > > > Odd.  I would expect an expiration...  And ten seconds is way longer
> > > > > > than the requested one jiffy!
> > > > > >     
> > > > > > > > The timeout was one jiffy, and more than a second later, no expiration.
> > > > > > > > Is it possible that this event was lost?  I am not seeing any sign of
> > > > > > > > this is the trace.
> > > > > > > > 
> > > > > > > > I don't see any sign of CPU hotplug (and I test with lots of that in
> > > > > > > > any case).
> > > > > > > > 
> > > > > > > > The last time we saw something like this it was a timer HW/driver problem,
> > > > > > > > but it is a bit hard to imagine such a problem affecting both ARM64
> > > > > > > > and SPARC.  ;-)      
> > > > > > > Could be different issues, both of which were hidden by that lockup detector.
> > > > > > > 
> > > > > > > There is an errata work around for the timers on this particular board.
> > > > > > > I'm only vaguely aware of it, so may be unconnected.
> > > > > > > 
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2&id=bb42ca47401010fc02901b5e8f79e40a26f208cb
> > > > > > > 
> > > > > > > Seems unlikely though! + we've not yet seen it on the other chips that
> > > > > > > errata effects (not that that means much).      
> > > > > > 
> > > > > > If you can reproduce quickly, might be worth trying anyway...
> > > > > > 
> > > > > > 							Thanx, Paul    
> > > > > Errata fix is running already and was for all those tests.    
> > > > 
> > > > I was afraid of that...  ;-)  
> > > It's a pretty rare errata it seems.  Not actually managed to catch
> > > one yet.   
> > > >   
> > > > > I'll have a dig into the timers today and see where I get to.    
> > > > 
> > > > Look forward to seeing what you find!  
> > > Nothing obvious turning up other than we don't seem to have issue
> > > when we aren't running hrtimers.
> > > 
> > > On a plus side I just got a report that it is effecting our d03
> > > boards which is good on the basis I couldn't tell what the difference
> > > could be wrt to this issue!
> > > 
> > > It indeed looks like we are consistently missing a timer before
> > > the rcu splat occurs.  
> > 
> > And for my part, my tests with CONFIG_HZ_PERIODIC=y and
> > CONFIG_RCU_FAST_NO_HZ=n showed roughly the same failure rate
> > as other runs.
> > 
> > Missing a timer can most certainly give RCU severe heartburn!  ;-)
> > Do you have what you need to track down the missing timer?  
> 
> Not managed to make much progress yet.  Turning on any additional tracing
> in that area seems to make the issue stop happening or at least
> occur very infrequently. Which certainly makes it 'fun' to find.
> 
> As a long shot I applied a locking fix from another reported issue that
> was causing rcu stalls and it seemed good for much longer, but
> eventually still occurred.
> 
> (from the thread rcu_sched stall while waiting in csd_lock_wait())

On the perhaps unlikely off-chance that it helps locate something,
here is a patch that adds a trace_printk() to check how long a CPU
believes that it can sleep when going idle.  The thought is to check
to see if a CPU with a timer set to expire in one jiffy thinks that
can sleep for (say) 30 seconds.

Didn't find anything for my problem, but I believe that yours is
different, so...

							Thanx, Paul

------------------------------------------------------------------------

commit 33103e7b1f89ef432dfe3337d2a6932cdf5c1312
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Aug 14 08:54:39 2017 -0700

    EXP: Trace tick return from tick_nohz_stop_sched_tick
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a899c5ce64..7358a5073dfb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -817,6 +817,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	 * (not only the tick).
 	 */
 	ts->sleep_length = ktime_sub(dev->next_event, now);
+	trace_printk("tick_nohz_stop_sched_tick: %lld\n", (tick - ktime_get()) / 1000);
 	return tick;
 }
 

WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: linux-arm-kernel@lists.infradead.org
Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Date: Tue, 15 Aug 2017 15:47:43 +0000	[thread overview]
Message-ID: <20170815154743.GK7017@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170802172555.0000468a@huawei.com>

On Wed, Aug 02, 2017 at 05:25:55PM +0100, Jonathan Cameron wrote:
> On Tue, 1 Aug 2017 11:46:46 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Mon, Jul 31, 2017 at 04:27:57PM +0100, Jonathan Cameron wrote:
> > > On Mon, 31 Jul 2017 08:04:11 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > On Mon, Jul 31, 2017 at 12:08:47PM +0100, Jonathan Cameron wrote:  
> > > > > On Fri, 28 Jul 2017 12:03:50 -0700
> > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > >     
> > > > > > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote:    
> > > > > > > On Fri, 28 Jul 2017 09:55:29 -0700
> > > > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > >       
> > > > > > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote:      
> > > > > > > > > On Fri, 28 Jul 2017 08:44:11 +0100
> > > > > > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:        
> > > > > > > > 
> > > > > > > > [ . . . ]
> > > > > > > >       
> > > > > > > > > Ok.  Some info.  I disabled a few driver (usb and SAS) in the interest of having
> > > > > > > > > fewer timer events.  Issue became much easier to trigger (on some runs before
> > > > > > > > > I could get tracing up and running)
> > > > > > > > >e
> > > > > > > > > So logs are large enough that pastebin doesn't like them - please shoet if        
> > > > > > > > >>e another timer period is of interest.        
> > > > > > > > > 
> > > > > > > > > https://pastebin.com/iUZDfQGM for the timer trace.
> > > > > > > > > https://pastebin.com/3w1F7amH for dmesg.  
> > > > > > > > > 
> > > > > > > > > The relevant timeout on the RCU stall detector was 8 seconds.  Event is
> > > > > > > > > detected around 835.
> > > > > > > > > 
> > > > > > > > > It's a lot of logs, so I haven't identified a smoking gun yet but there
> > > > > > > > > may well be one in there.        
> > > > > > > > 
> > > > > > > > The dmesg says:
> > > > > > > > 
> > > > > > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> > > > > > > > 
> > > > > > > > So I look for "rcu_preempt" timer events and find these:
> > > > > > > > 
> > > > > > > > rcu_preempt-9     [019] ....   827.579114: timer_init: timerÿff8017d5fc7da0
> > > > > > > > rcu_preempt-9     [019] d..1   827.579115: timer_start: timerÿff8017d5fc7da0 function=process_timeout 
> > > > > > > > 
> > > > > > > > Next look for "ffff8017d5fc7da0" and I don't find anything else.      
> > > > > > > It does show up off the bottom of what would fit in pastebin...
> > > > > > > 
> > > > > > >      rcu_preempt-9     [001] d..1   837.681077: timer_cancel: timerÿff8017d5fc7da0
> > > > > > >      rcu_preempt-9     [001] ....   837.681086: timer_init: timerÿff8017d5fc7da0
> > > > > > >      rcu_preempt-9     [001] d..1   837.681087: timer_start: timerÿff8017d5fc7da0 function=process_timeout expiresB95101298 [timeout=1] cpu=1 idx=0 flags=      
> > > > > > 
> > > > > > Odd.  I would expect an expiration...  And ten seconds is way longer
> > > > > > than the requested one jiffy!
> > > > > >     
> > > > > > > > The timeout was one jiffy, and more than a second later, no expiration.
> > > > > > > > Is it possible that this event was lost?  I am not seeing any sign of
> > > > > > > > this is the trace.
> > > > > > > > 
> > > > > > > > I don't see any sign of CPU hotplug (and I test with lots of that in
> > > > > > > > any case).
> > > > > > > > 
> > > > > > > > The last time we saw something like this it was a timer HW/driver problem,
> > > > > > > > but it is a bit hard to imagine such a problem affecting both ARM64
> > > > > > > > and SPARC.  ;-)      
> > > > > > > Could be different issues, both of which were hidden by that lockup detector.
> > > > > > > 
> > > > > > > There is an errata work around for the timers on this particular board.
> > > > > > > I'm only vaguely aware of it, so may be unconnected.
> > > > > > > 
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2&id»42ca47401010fc02901b5e8f79e40a26f208cb
> > > > > > > 
> > > > > > > Seems unlikely though! + we've not yet seen it on the other chips that
> > > > > > > errata effects (not that that means much).      
> > > > > > 
> > > > > > If you can reproduce quickly, might be worth trying anyway...
> > > > > > 
> > > > > > 							Thanx, Paul    
> > > > > Errata fix is running already and was for all those tests.    
> > > > 
> > > > I was afraid of that...  ;-)  
> > > It's a pretty rare errata it seems.  Not actually managed to catch
> > > one yet.   
> > > >   
> > > > > I'll have a dig into the timers today and see where I get to.    
> > > > 
> > > > Look forward to seeing what you find!  
> > > Nothing obvious turning up other than we don't seem to have issue
> > > when we aren't running hrtimers.
> > > 
> > > On a plus side I just got a report that it is effecting our d03
> > > boards which is good on the basis I couldn't tell what the difference
> > > could be wrt to this issue!
> > > 
> > > It indeed looks like we are consistently missing a timer before
> > > the rcu splat occurs.  
> > 
> > And for my part, my tests with CONFIG_HZ_PERIODIC=y and
> > CONFIG_RCU_FAST_NO_HZ=n showed roughly the same failure rate
> > as other runs.
> > 
> > Missing a timer can most certainly give RCU severe heartburn!  ;-)
> > Do you have what you need to track down the missing timer?  
> 
> Not managed to make much progress yet.  Turning on any additional tracing
> in that area seems to make the issue stop happening or at least
> occur very infrequently. Which certainly makes it 'fun' to find.
> 
> As a long shot I applied a locking fix from another reported issue that
> was causing rcu stalls and it seemed good for much longer, but
> eventually still occurred.
> 
> (from the thread rcu_sched stall while waiting in csd_lock_wait())

On the perhaps unlikely off-chance that it helps locate something,
here is a patch that adds a trace_printk() to check how long a CPU
believes that it can sleep when going idle.  The thought is to check
to see if a CPU with a timer set to expire in one jiffy thinks that
can sleep for (say) 30 seconds.

Didn't find anything for my problem, but I believe that yours is
different, so...

							Thanx, Paul

------------------------------------------------------------------------

commit 33103e7b1f89ef432dfe3337d2a6932cdf5c1312
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Aug 14 08:54:39 2017 -0700

    EXP: Trace tick return from tick_nohz_stop_sched_tick
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a899c5ce64..7358a5073dfb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -817,6 +817,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	 * (not only the tick).
 	 */
 	ts->sleep_length = ktime_sub(dev->next_event, now);
+	trace_printk("tick_nohz_stop_sched_tick: %lld\n", (tick - ktime_get()) / 1000);
 	return tick;
 }
 


WARNING: multiple messages have this Message-ID (diff)
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Date: Tue, 15 Aug 2017 08:47:43 -0700	[thread overview]
Message-ID: <20170815154743.GK7017@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170802172555.0000468a@huawei.com>

On Wed, Aug 02, 2017 at 05:25:55PM +0100, Jonathan Cameron wrote:
> On Tue, 1 Aug 2017 11:46:46 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Mon, Jul 31, 2017 at 04:27:57PM +0100, Jonathan Cameron wrote:
> > > On Mon, 31 Jul 2017 08:04:11 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > On Mon, Jul 31, 2017 at 12:08:47PM +0100, Jonathan Cameron wrote:  
> > > > > On Fri, 28 Jul 2017 12:03:50 -0700
> > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > >     
> > > > > > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote:    
> > > > > > > On Fri, 28 Jul 2017 09:55:29 -0700
> > > > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > >       
> > > > > > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote:      
> > > > > > > > > On Fri, 28 Jul 2017 08:44:11 +0100
> > > > > > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:        
> > > > > > > > 
> > > > > > > > [ . . . ]
> > > > > > > >       
> > > > > > > > > Ok.  Some info.  I disabled a few driver (usb and SAS) in the interest of having
> > > > > > > > > fewer timer events.  Issue became much easier to trigger (on some runs before
> > > > > > > > > I could get tracing up and running)
> > > > > > > > >e
> > > > > > > > > So logs are large enough that pastebin doesn't like them - please shoet if        
> > > > > > > > >>e another timer period is of interest.        
> > > > > > > > > 
> > > > > > > > > https://pastebin.com/iUZDfQGM for the timer trace.
> > > > > > > > > https://pastebin.com/3w1F7amH for dmesg.  
> > > > > > > > > 
> > > > > > > > > The relevant timeout on the RCU stall detector was 8 seconds.  Event is
> > > > > > > > > detected around 835.
> > > > > > > > > 
> > > > > > > > > It's a lot of logs, so I haven't identified a smoking gun yet but there
> > > > > > > > > may well be one in there.        
> > > > > > > > 
> > > > > > > > The dmesg says:
> > > > > > > > 
> > > > > > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> > > > > > > > 
> > > > > > > > So I look for "rcu_preempt" timer events and find these:
> > > > > > > > 
> > > > > > > > rcu_preempt-9     [019] ....   827.579114: timer_init: timer=ffff8017d5fc7da0
> > > > > > > > rcu_preempt-9     [019] d..1   827.579115: timer_start: timer=ffff8017d5fc7da0 function=process_timeout 
> > > > > > > > 
> > > > > > > > Next look for "ffff8017d5fc7da0" and I don't find anything else.      
> > > > > > > It does show up off the bottom of what would fit in pastebin...
> > > > > > > 
> > > > > > >      rcu_preempt-9     [001] d..1   837.681077: timer_cancel: timer=ffff8017d5fc7da0
> > > > > > >      rcu_preempt-9     [001] ....   837.681086: timer_init: timer=ffff8017d5fc7da0
> > > > > > >      rcu_preempt-9     [001] d..1   837.681087: timer_start: timer=ffff8017d5fc7da0 function=process_timeout expires=4295101298 [timeout=1] cpu=1 idx=0 flags=      
> > > > > > 
> > > > > > Odd.  I would expect an expiration...  And ten seconds is way longer
> > > > > > than the requested one jiffy!
> > > > > >     
> > > > > > > > The timeout was one jiffy, and more than a second later, no expiration.
> > > > > > > > Is it possible that this event was lost?  I am not seeing any sign of
> > > > > > > > this is the trace.
> > > > > > > > 
> > > > > > > > I don't see any sign of CPU hotplug (and I test with lots of that in
> > > > > > > > any case).
> > > > > > > > 
> > > > > > > > The last time we saw something like this it was a timer HW/driver problem,
> > > > > > > > but it is a bit hard to imagine such a problem affecting both ARM64
> > > > > > > > and SPARC.  ;-)      
> > > > > > > Could be different issues, both of which were hidden by that lockup detector.
> > > > > > > 
> > > > > > > There is an errata work around for the timers on this particular board.
> > > > > > > I'm only vaguely aware of it, so may be unconnected.
> > > > > > > 
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2&id=bb42ca47401010fc02901b5e8f79e40a26f208cb
> > > > > > > 
> > > > > > > Seems unlikely though! + we've not yet seen it on the other chips that
> > > > > > > errata effects (not that that means much).      
> > > > > > 
> > > > > > If you can reproduce quickly, might be worth trying anyway...
> > > > > > 
> > > > > > 							Thanx, Paul    
> > > > > Errata fix is running already and was for all those tests.    
> > > > 
> > > > I was afraid of that...  ;-)  
> > > It's a pretty rare errata it seems.  Not actually managed to catch
> > > one yet.   
> > > >   
> > > > > I'll have a dig into the timers today and see where I get to.    
> > > > 
> > > > Look forward to seeing what you find!  
> > > Nothing obvious turning up other than we don't seem to have issue
> > > when we aren't running hrtimers.
> > > 
> > > On a plus side I just got a report that it is effecting our d03
> > > boards which is good on the basis I couldn't tell what the difference
> > > could be wrt to this issue!
> > > 
> > > It indeed looks like we are consistently missing a timer before
> > > the rcu splat occurs.  
> > 
> > And for my part, my tests with CONFIG_HZ_PERIODIC=y and
> > CONFIG_RCU_FAST_NO_HZ=n showed roughly the same failure rate
> > as other runs.
> > 
> > Missing a timer can most certainly give RCU severe heartburn!  ;-)
> > Do you have what you need to track down the missing timer?  
> 
> Not managed to make much progress yet.  Turning on any additional tracing
> in that area seems to make the issue stop happening or at least
> occur very infrequently. Which certainly makes it 'fun' to find.
> 
> As a long shot I applied a locking fix from another reported issue that
> was causing rcu stalls and it seemed good for much longer, but
> eventually still occurred.
> 
> (from the thread rcu_sched stall while waiting in csd_lock_wait())

On the perhaps unlikely off-chance that it helps locate something,
here is a patch that adds a trace_printk() to check how long a CPU
believes that it can sleep when going idle.  The thought is to check
to see if a CPU with a timer set to expire in one jiffy thinks that
can sleep for (say) 30 seconds.

Didn't find anything for my problem, but I believe that yours is
different, so...

							Thanx, Paul

------------------------------------------------------------------------

commit 33103e7b1f89ef432dfe3337d2a6932cdf5c1312
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Aug 14 08:54:39 2017 -0700

    EXP: Trace tick return from tick_nohz_stop_sched_tick
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a899c5ce64..7358a5073dfb 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -817,6 +817,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	 * (not only the tick).
 	 */
 	ts->sleep_length = ktime_sub(dev->next_event, now);
+	trace_printk("tick_nohz_stop_sched_tick: %lld\n", (tick - ktime_get()) / 1000);
 	return tick;
 }
 

  reply	other threads:[~2017-08-15 15:47 UTC|newest]

Thread overview: 241+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-25 11:32 RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Jonathan Cameron
2017-07-25 12:26 ` Nicholas Piggin
2017-07-25 12:26   ` Nicholas Piggin
2017-07-25 13:46   ` Paul E. McKenney
2017-07-25 13:46     ` Paul E. McKenney
2017-07-25 13:46     ` Paul E. McKenney
2017-07-25 14:42     ` Jonathan Cameron
2017-07-25 14:42       ` Jonathan Cameron
2017-07-25 14:42       ` Jonathan Cameron
2017-07-25 15:12       ` Paul E. McKenney
2017-07-25 15:12         ` Paul E. McKenney
2017-07-25 15:12         ` Paul E. McKenney
2017-07-25 16:52         ` Jonathan Cameron
2017-07-25 16:52           ` Jonathan Cameron
2017-07-25 16:52           ` Jonathan Cameron
2017-07-25 21:10           ` David Miller
2017-07-25 21:10             ` David Miller
2017-07-25 21:10             ` David Miller
2017-07-26  3:55             ` Paul E. McKenney
2017-07-26  3:55               ` Paul E. McKenney
2017-07-26  3:55               ` Paul E. McKenney
2017-07-26  4:02               ` David Miller
2017-07-26  4:02                 ` David Miller
2017-07-26  4:02                 ` David Miller
2017-07-26  4:12                 ` Paul E. McKenney
2017-07-26  4:12                   ` Paul E. McKenney
2017-07-26  4:12                   ` Paul E. McKenney
2017-07-26  8:16                   ` Jonathan Cameron
2017-07-26  8:16                     ` Jonathan Cameron
2017-07-26  8:16                     ` Jonathan Cameron
2017-07-26  9:32                     ` Jonathan Cameron
2017-07-26  9:32                       ` Jonathan Cameron
2017-07-26  9:32                       ` Jonathan Cameron
2017-07-26 12:28                       ` Jonathan Cameron
2017-07-26 12:28                         ` Jonathan Cameron
2017-07-26 12:28                         ` Jonathan Cameron
2017-07-26 12:49                         ` Jonathan Cameron
2017-07-26 12:49                           ` Jonathan Cameron
2017-07-26 12:49                           ` Jonathan Cameron
2017-07-26 14:14                         ` Paul E. McKenney
2017-07-26 14:14                           ` Paul E. McKenney
2017-07-26 14:14                           ` Paul E. McKenney
2017-07-26 14:23                           ` Jonathan Cameron
2017-07-26 14:23                             ` Jonathan Cameron
2017-07-26 14:23                             ` Jonathan Cameron
2017-07-26 15:33                             ` Jonathan Cameron
2017-07-26 15:33                               ` Jonathan Cameron
2017-07-26 15:33                               ` Jonathan Cameron
2017-07-26 15:49                               ` Paul E. McKenney
2017-07-26 15:49                                 ` Paul E. McKenney
2017-07-26 15:49                                 ` Paul E. McKenney
2017-07-26 16:54                                 ` David Miller
2017-07-26 16:54                                   ` David Miller
2017-07-26 16:54                                   ` David Miller
2017-07-26 17:13                                   ` Jonathan Cameron
2017-07-26 17:13                                     ` Jonathan Cameron
2017-07-26 17:13                                     ` Jonathan Cameron
2017-07-27  7:41                                     ` Jonathan Cameron
2017-07-27  7:41                                       ` Jonathan Cameron
2017-07-27  7:41                                       ` Jonathan Cameron
2017-07-26 17:50                                   ` Paul E. McKenney
2017-07-26 17:50                                     ` Paul E. McKenney
2017-07-26 17:50                                     ` Paul E. McKenney
2017-07-26 22:36                                     ` Paul E. McKenney
2017-07-26 22:36                                       ` Paul E. McKenney
2017-07-26 22:36                                       ` Paul E. McKenney
2017-07-26 22:45                                       ` David Miller
2017-07-26 22:45                                         ` David Miller
2017-07-26 22:45                                         ` David Miller
2017-07-26 23:15                                         ` Paul E. McKenney
2017-07-26 23:15                                           ` Paul E. McKenney
2017-07-26 23:15                                           ` Paul E. McKenney
2017-07-26 23:22                                           ` David Miller
2017-07-26 23:22                                             ` David Miller
2017-07-26 23:22                                             ` David Miller
2017-07-27  1:42                                             ` Paul E. McKenney
2017-07-27  1:42                                               ` Paul E. McKenney
2017-07-27  1:42                                               ` Paul E. McKenney
2017-07-27  4:34                                               ` Nicholas Piggin
2017-07-27  4:34                                                 ` Nicholas Piggin
2017-07-27  4:34                                                 ` Nicholas Piggin
2017-07-27 12:49                                                 ` Paul E. McKenney
2017-07-27 12:49                                                   ` Paul E. McKenney
2017-07-27 12:49                                                   ` Paul E. McKenney
2017-07-27 13:49                                                   ` Jonathan Cameron
2017-07-27 13:49                                                     ` Jonathan Cameron
2017-07-27 13:49                                                     ` Jonathan Cameron
2017-07-27 16:39                                                     ` Jonathan Cameron
2017-07-27 16:39                                                       ` Jonathan Cameron
2017-07-27 16:39                                                       ` Jonathan Cameron
2017-07-27 16:52                                                       ` Paul E. McKenney
2017-07-27 16:52                                                         ` Paul E. McKenney
2017-07-27 16:52                                                         ` Paul E. McKenney
2017-07-28  7:44                                                         ` Jonathan Cameron
2017-07-28  7:44                                                           ` Jonathan Cameron
2017-07-28  7:44                                                           ` Jonathan Cameron
2017-07-28 12:54                                                           ` Boqun Feng
2017-07-28 12:54                                                             ` Boqun Feng
2017-07-28 12:54                                                             ` Boqun Feng
2017-07-28 13:13                                                             ` Jonathan Cameron
2017-07-28 13:13                                                               ` Jonathan Cameron
2017-07-28 13:13                                                               ` Jonathan Cameron
2017-07-28 14:55                                                             ` Paul E. McKenney
2017-07-28 14:55                                                               ` Paul E. McKenney
2017-07-28 14:55                                                               ` Paul E. McKenney
2017-07-28 18:41                                                               ` Paul E. McKenney
2017-07-28 18:41                                                                 ` Paul E. McKenney
2017-07-28 18:41                                                                 ` Paul E. McKenney
2017-07-28 19:09                                                                 ` Paul E. McKenney
2017-07-28 19:09                                                                   ` Paul E. McKenney
2017-07-28 19:09                                                                   ` Paul E. McKenney
2017-07-30 13:37                                                                   ` Boqun Feng
2017-07-30 13:37                                                                     ` Boqun Feng
2017-07-30 13:37                                                                     ` Boqun Feng
2017-07-30 16:59                                                                     ` Paul E. McKenney
2017-07-30 16:59                                                                       ` Paul E. McKenney
2017-07-30 16:59                                                                       ` Paul E. McKenney
2017-07-29  1:20                                                                 ` Boqun Feng
2017-07-29  1:20                                                                   ` Boqun Feng
2017-07-29  1:20                                                                   ` Boqun Feng
2017-07-28 18:42                                                             ` David Miller
2017-07-28 18:42                                                               ` David Miller
2017-07-28 18:42                                                               ` David Miller
2017-07-28 13:08                                                           ` Jonathan Cameron
2017-07-28 13:08                                                             ` Jonathan Cameron
2017-07-28 13:24                                                           ` Jonathan Cameron
2017-07-28 13:24                                                             ` Jonathan Cameron
2017-07-28 13:24                                                             ` Jonathan Cameron
2017-07-28 16:55                                                             ` Paul E. McKenney
2017-07-28 16:55                                                               ` Paul E. McKenney
2017-07-28 17:27                                                               ` Jonathan Cameron
2017-07-28 17:27                                                                 ` Jonathan Cameron
2017-07-28 17:27                                                                 ` Jonathan Cameron
2017-07-28 19:03                                                                 ` Paul E. McKenney
2017-07-28 19:03                                                                   ` Paul E. McKenney
2017-07-28 19:03                                                                   ` Paul E. McKenney
2017-07-31 11:08                                                                   ` Jonathan Cameron
2017-07-31 11:08                                                                     ` Jonathan Cameron
2017-07-31 11:08                                                                     ` Jonathan Cameron
2017-07-31 15:04                                                                     ` Paul E. McKenney
2017-07-31 15:04                                                                       ` Paul E. McKenney
2017-07-31 15:04                                                                       ` Paul E. McKenney
2017-07-31 15:27                                                                       ` Jonathan Cameron
2017-07-31 15:27                                                                         ` Jonathan Cameron
2017-07-31 15:27                                                                         ` Jonathan Cameron
2017-08-01 18:46                                                                         ` Paul E. McKenney
2017-08-01 18:46                                                                           ` Paul E. McKenney
2017-08-01 18:46                                                                           ` Paul E. McKenney
2017-08-02 16:25                                                                           ` Jonathan Cameron
2017-08-02 16:25                                                                             ` Jonathan Cameron
2017-08-02 16:25                                                                             ` Jonathan Cameron
2017-08-15 15:47                                                                             ` Paul E. McKenney [this message]
2017-08-15 15:47                                                                               ` Paul E. McKenney
2017-08-15 15:47                                                                               ` Paul E. McKenney
2017-08-16  1:24                                                                               ` Jonathan Cameron
2017-08-16  1:24                                                                                 ` Jonathan Cameron
2017-08-16  1:24                                                                                 ` Jonathan Cameron
2017-08-16 12:43                                                                               ` Michael Ellerman
2017-08-16 12:43                                                                                 ` Michael Ellerman
2017-08-16 12:43                                                                                 ` Michael Ellerman
2017-08-16 12:56                                                                                 ` Paul E. McKenney
2017-08-16 12:56                                                                                   ` Paul E. McKenney
2017-08-16 12:56                                                                                   ` Paul E. McKenney
2017-08-16 15:31                                                                                   ` Nicholas Piggin
2017-08-16 15:31                                                                                     ` Nicholas Piggin
2017-08-16 15:31                                                                                     ` Nicholas Piggin
2017-08-16 16:27                                                                                   ` Paul E. McKenney
2017-08-16 16:27                                                                                     ` Paul E. McKenney
2017-08-16 16:27                                                                                     ` Paul E. McKenney
2017-08-17 13:55                                                                                     ` Michael Ellerman
2017-08-17 13:55                                                                                       ` Michael Ellerman
2017-08-17 13:55                                                                                       ` Michael Ellerman
2017-08-20  4:45                                                                                     ` Nicholas Piggin
2017-08-20  4:45                                                                                       ` Nicholas Piggin
2017-08-20  4:45                                                                                       ` Nicholas Piggin
2017-08-20  5:01                                                                                       ` David Miller
2017-08-20  5:01                                                                                         ` David Miller
2017-08-20  5:01                                                                                         ` David Miller
2017-08-20  5:04                                                                                       ` Paul E. McKenney
2017-08-20  5:04                                                                                         ` Paul E. McKenney
2017-08-20  5:04                                                                                         ` Paul E. McKenney
2017-08-20 13:00                                                                                       ` Nicholas Piggin
2017-08-20 13:00                                                                                         ` Nicholas Piggin
2017-08-20 13:00                                                                                         ` Nicholas Piggin
2017-08-20 18:35                                                                                         ` Paul E. McKenney
2017-08-20 18:35                                                                                           ` Paul E. McKenney
2017-08-20 18:35                                                                                           ` Paul E. McKenney
2017-08-20 21:14                                                                                           ` Paul E. McKenney
2017-08-20 21:14                                                                                             ` Paul E. McKenney
2017-08-20 21:14                                                                                             ` Paul E. McKenney
2017-08-21  0:52                                                                                             ` Nicholas Piggin
2017-08-21  0:52                                                                                               ` Nicholas Piggin
2017-08-21  0:52                                                                                               ` Nicholas Piggin
2017-08-21  6:06                                                                                               ` Nicholas Piggin
2017-08-21  6:06                                                                                                 ` Nicholas Piggin
2017-08-21  6:06                                                                                                 ` Nicholas Piggin
2017-08-21 10:18                                                                                                 ` Jonathan Cameron
2017-08-21 10:18                                                                                                   ` Jonathan Cameron
2017-08-21 10:18                                                                                                   ` Jonathan Cameron
2017-08-21 14:19                                                                                                   ` Nicholas Piggin
2017-08-21 14:19                                                                                                     ` Nicholas Piggin
2017-08-21 14:19                                                                                                     ` Nicholas Piggin
2017-08-21 15:02                                                                                                     ` Jonathan Cameron
2017-08-21 15:02                                                                                                       ` Jonathan Cameron
2017-08-21 15:02                                                                                                       ` Jonathan Cameron
2017-08-21 20:55                                                                                                     ` David Miller
2017-08-21 20:55                                                                                                       ` David Miller
2017-08-21 20:55                                                                                                       ` David Miller
2017-08-22  7:49                                                                                                       ` Jonathan Cameron
2017-08-22  7:49                                                                                                         ` Jonathan Cameron
2017-08-22  7:49                                                                                                         ` Jonathan Cameron
2017-08-22  8:51                                                                                                         ` Abdul Haleem
2017-08-22  8:51                                                                                                           ` Abdul Haleem
2017-08-22  8:51                                                                                                           ` Abdul Haleem
2017-08-22 15:26                                                                                                           ` Paul E. McKenney
2017-08-22 15:26                                                                                                             ` Paul E. McKenney
2017-08-22 15:26                                                                                                             ` Paul E. McKenney
2017-09-06 12:28                                                                                                             ` Paul E. McKenney
2017-09-06 12:28                                                                                                               ` Paul E. McKenney
2017-09-06 12:28                                                                                                               ` Paul E. McKenney
2017-08-22  0:38                                                                                               ` Paul E. McKenney
2017-08-22  0:38                                                                                                 ` Paul E. McKenney
2017-08-22  0:38                                                                                                 ` Paul E. McKenney
2017-07-31 11:09                                           ` Jonathan Cameron
2017-07-31 11:09                                             ` Jonathan Cameron
2017-07-31 11:09                                             ` Jonathan Cameron
2017-07-31 11:55                                             ` Jonathan Cameron
2017-07-31 11:55                                               ` Jonathan Cameron
2017-07-31 11:55                                               ` Jonathan Cameron
2017-08-01 10:53                                               ` Jonathan Cameron
2017-08-01 10:53                                                 ` Jonathan Cameron
2017-08-01 10:53                                                 ` Jonathan Cameron
2017-07-26 16:48                           ` David Miller
2017-07-26 16:48                             ` David Miller
2017-07-26 16:48                             ` David Miller
2017-07-26  3:53           ` Paul E. McKenney
2017-07-26  3:53             ` Paul E. McKenney
2017-07-26  3:53             ` Paul E. McKenney
2017-07-26  7:51             ` Jonathan Cameron
2017-07-26  7:51               ` Jonathan Cameron
2017-07-26  7:51               ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170815154743.GK7017@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dzickus@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=sfr@canb.auug.org.au \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.