public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
Date: Wed, 16 Jan 2019 07:24:09 +0100	[thread overview]
Message-ID: <20190116062408.GA26169@lenoir> (raw)
In-Reply-To: <e838fa7f-f43f-7b3c-aae6-edb4f6734964@gmail.com>

On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
> 
> # tracer: nop
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>           <idle>-0     [001] d.h2  1479.099092: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.099098: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.099106: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.099114: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.099120: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.099121: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.099134: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.099135: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.099141: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.100094: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.100109: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.100116: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.101091: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.101113: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.101118: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.102094: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.102114: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.102121: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.103091: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.103097: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.103105: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.103114: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.103118: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.103119: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.103131: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.103132: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.103138: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.105092: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.105115: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.105119: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.106092: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.106112: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] .Ns2  1479.106144: softirq_exit: vec=9 [action=RCU]
>          cpuhp/1-13    [001] d..2  1479.106279: timer_cancel: timer=0000000009a25653
>           <idle>-0     [001] d.h2  1479.106965: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.106969: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.106974: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.106981: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.106984: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.106985: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.106994: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.106995: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.106999: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.107996: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.108010: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.108014: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.109009: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.109013: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.109024: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.109028: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.109028: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.109033: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.110013: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.110033: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.110040: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.111011: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
>          cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
>          cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d..1  1479.113077: <stack trace>
>  => can_stop_idle_tick.isra.14
>  => tick_nohz_get_sleep_length
>  => menu_select
>  => cpuidle_select
>  => do_idle
>  => cpu_startup_entry
>  => start_secondary
>  => secondary_startup_64
>           <idle>-0     [001] .Ns2  1479.113110: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] .Ns2  1479.113114: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] .Ns2  1479.113115: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] .Ns2  1479.113139: softirq_exit: vec=9 [action=RCU]


So, I'm being very doubtful now actually.

First of all, I realize my patch only fixes things halfway. Only softirq raised from
hardirqs are going to be handled after ksoftirqd is parked. If it's raised outside
interrupts, the softirq is going to be ignored. If other smpboot kthreads, following
ksoftirqd parking, raise softirqs from their park() callback we should at least warn
that we are ignoring that.

Also, I'm still not sure if you're running threaded IRQs. Very likely not as
the softirqs in your traces are executed in idle, which means from an IRQ interrupting
idle. You haven't passed "threadirqs" in your kernel parameters, right?
In any case we need to figure out why your softirqs raised at 1479.112930 and
1479.112935 don't get handled.

Finally I don't understand why you have so many timer softirqs and yet no timer traces
at all, I turned them on in the script I gave you. So something weird is happening with
the timers code. At best we are wasting lots of CPU cycles with those empty softirqs, at
worst we are miscalculating some expiration somewhere, possibly ignoring a timer.

So I'll need to cook up an updated debug patch for you to trigger if you don't
mind.

Thanks.

  parent reply	other threads:[~2019-01-16  6:24 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16  6:13 Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem Heiner Kallweit
2018-08-18 11:26 ` Thomas Gleixner
2018-08-18 22:34   ` Heiner Kallweit
2018-08-24  4:12 ` Frederic Weisbecker
2018-08-24  5:59   ` Heiner Kallweit
2018-08-24  8:01     ` Thomas Gleixner
2018-08-24 14:30       ` Frederic Weisbecker
2018-08-24 17:06         ` Heiner Kallweit
2018-08-28  2:25           ` Frederic Weisbecker
2018-09-27 16:05             ` Thomas Gleixner
2018-09-28 13:18               ` Frederic Weisbecker
2018-09-28 20:35                 ` Heiner Kallweit
2018-10-15 20:58                 ` Heiner Kallweit
2018-12-24 21:11                   ` Heiner Kallweit
2018-12-27  6:53                   ` Frederic Weisbecker
2018-12-27 23:11                     ` Heiner Kallweit
2018-12-28  1:31                       ` Frederic Weisbecker
2018-12-28  6:34                         ` Heiner Kallweit
2018-12-28  6:39                           ` Heiner Kallweit
2019-01-09 22:20                             ` Heiner Kallweit
2019-01-11 21:36                               ` Frederic Weisbecker
2019-01-16  6:24                       ` Frederic Weisbecker [this message]
2019-01-16 18:42                         ` Heiner Kallweit
2019-01-24 19:37                         ` Heiner Kallweit
2019-02-14 19:05                           ` Heiner Kallweit
2019-02-14 21:47                             ` Thomas Gleixner
2019-02-14 22:33                               ` Heiner Kallweit
2019-02-15  0:31                                 ` Frederic Weisbecker
2019-02-16  9:14                                   ` Heiner Kallweit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190116062408.GA26169@lenoir \
    --to=frederic@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=hkallweit1@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox