public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Yang Yingliang <yangyingliang@huawei.com>
To: Stephen Boyd <sboyd@kernel.org>, <linux-kernel@vger.kernel.org>
Cc: <tglx@linutronix.de>, <john.stultz@linaro.org>
Subject: Re: [PATCH] timer_list: avoid other cpu soft lockup when printing timer list
Date: Mon, 9 Mar 2020 16:20:23 +0800	[thread overview]
Message-ID: <5E65FC47.8070102@huawei.com> (raw)
In-Reply-To: <158224928306.184098.11550548610262156729@swboyd.mtv.corp.google.com>

Hi,

sorry for the late reply.

On 2020/2/21 9:41, Stephen Boyd wrote:
> Quoting Yang Yingliang (2020-02-19 19:42:32)
>> If system has many cpus (e.g. 128), it will spend a lot of time to
>> print message to the console when execute echo q > /proc/sysrq-trigger.
>>
>> When /proc/sys/kernel/numa_balancing is enabled, if the migration threads
>> are woke up, the migration thread that on print mesasage cpu can't run
>> until the print finish, another migration thread may trigger soft lockup.
>>
>> PID: 619    TASK: ffffa02fdd8bec80  CPU: 121  COMMAND: "migration/121"
>>    #0 [ffff00000a103b10] __crash_kexec at ffff0000081bf200
>>    #1 [ffff00000a103ca0] panic at ffff0000080ec93c
>>    #2 [ffff00000a103d80] watchdog_timer_fn at ffff0000081f8a14
>>    #3 [ffff00000a103e00] __run_hrtimer at ffff00000819701c
>>    #4 [ffff00000a103e40] __hrtimer_run_queues at ffff000008197420
>>    #5 [ffff00000a103ea0] hrtimer_interrupt at ffff00000819831c
>>    #6 [ffff00000a103f10] arch_timer_dying_cpu at ffff000008b53144
>>    #7 [ffff00000a103f30] handle_percpu_devid_irq at ffff000008174e34
>>    #8 [ffff00000a103f70] generic_handle_irq at ffff00000816c5e8
>>    #9 [ffff00000a103f90] __handle_domain_irq at ffff00000816d1f4
>>   #10 [ffff00000a103fd0] gic_handle_irq at ffff000008081860
>>   --- <IRQ stack> ---
>>   #11 [ffff00000d6e3d50] el1_irq at ffff0000080834c8
>>   #12 [ffff00000d6e3d60] multi_cpu_stop at ffff0000081d9964
>>   #13 [ffff00000d6e3db0] cpu_stopper_thread at ffff0000081d9cfc
>>   #14 [ffff00000d6e3e10] smpboot_thread_fn at ffff00000811e0a8
>>   #15 [ffff00000d6e3e70] kthread at ffff000008118988
>>
>> To avoid this soft lockup, add touch_all_softlockup_watchdogs()
>> in sysrq_timer_list_show()
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   kernel/time/timer_list.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
>> index acb326f..4cb0e6f 100644
>> --- a/kernel/time/timer_list.c
>> +++ b/kernel/time/timer_list.c
>> @@ -289,13 +289,17 @@ void sysrq_timer_list_show(void)
>>   
>>          timer_list_header(NULL, now);
>>   
>> -       for_each_online_cpu(cpu)
>> +       for_each_online_cpu(cpu) {
>> +               touch_all_softlockup_watchdogs();
> Usage of touch_all_softlockup_watchdogs() deserves a comment. Otherwise
> the reader is left to git archaeology to understand why watchdogs are
> being touched. Of course, we failed at that with commit 010704276865
> ("sysrq: Reset the watchdog timers while displaying high-resolution
> timers") which looks awfully similar to this.
OK, I will add a comment later.
>
>>                  print_cpu(NULL, cpu, now);
>> +       }
>>   
>>   #ifdef CONFIG_GENERIC_CLOCKEVENTS
>>          timer_list_show_tickdevices_header(NULL);
>> -       for_each_online_cpu(cpu)
>> +       for_each_online_cpu(cpu) {
>> +               touch_all_softlockup_watchdogs();
>>                  print_tickdevice(NULL, tick_get_device(cpu), cpu);
> print_tickdevice() already has touch_nmi_watchdog() which eventually
> touches the softlockup watchdog. Is the problem that it isn't enough to
> do that when the migration thread is also running?
No, it's not enough.
The soft lockup occurs on other cpu, so other cpu's soft watchdog need 
to be touched.

>
>> +       }
>>   #endif
>>          return;
> .
>



      reply	other threads:[~2020-03-09  8:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-20  3:42 [PATCH] timer_list: avoid other cpu soft lockup when printing timer list Yang Yingliang
2020-02-21  1:41 ` Stephen Boyd
2020-03-09  8:20   ` Yang Yingliang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5E65FC47.8070102@huawei.com \
    --to=yangyingliang@huawei.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox