All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Cheng-Jui Wang (王正睿)" <Cheng-Jui.Wang@mediatek.com>
To: "frederic@kernel.org" <frederic@kernel.org>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	"rcu@vger.kernel.org" <rcu@vger.kernel.org>
Cc: wsd_upstream <wsd_upstream@mediatek.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kernel-team@meta.com" <kernel-team@meta.com>,
	"Bobule Chang (張弘義)" <bobule.chang@mediatek.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"Cheng-Jui Wang (王正睿)" <Cheng-Jui.Wang@mediatek.com>,
	"joel@joelfernandes.org" <joel@joelfernandes.org>
Subject: Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
Date: Tue, 29 Oct 2024 02:20:51 +0000	[thread overview]
Message-ID: <b2b0bceacbd080e13b3aa7ad05569a787df4646d.camel@mediatek.com> (raw)
In-Reply-To: <c3218fbe-7bb1-4fd6-8b00-beee244ffeae@paulmck-laptop>

On Mon, 2024-10-28 at 17:22 -0700, Paul E. McKenney wrote:
> The result is that the current leaf rcu_node structure's ->lock is
> acquired only if a stack backtrace might be needed from the current CPU,
> and is held across only that CPU's backtrace. As a result, if there are

After upgrading our device to kernel-6.11, we encountered a lockup
scenario under stall warning. 
I had prepared a patch to submit, but I noticed that this series has
already addressed some issues, though it hasn't been merged into the
mainline yet. So, I decided to reply to this series for discussion on
how to fix it before pushing. Here is the lockup scenario We
encountered:

Devices: arm64 with only 8 cores
One CPU holds rnp->lock in rcu_dump_cpu_stack() while trying to dump
other CPUs, but it waits for the corresponding CPU to dump backtrace,
with a 10-second timeout.

   __delay()
   __const_udelay()
   nmi_trigger_cpumask_backtrace()
   arch_trigger_cpumask_backtrace()
   trigger_single_cpu_backtrace()
   dump_cpu_task()
   rcu_dump_cpu_stacks()  <- holding rnp->lock
   print_other_cpu_stall()
   check_cpu_stall()
   rcu_pending()
   rcu_sched_clock_irq()
   update_process_times()

However, the other 7 CPUs are waiting for rnp->lock on the path to
report qs.

   queued_spin_lock_slowpath()
   queued_spin_lock()
   do_raw_spin_lock()
   __raw_spin_lock_irqsave()
   _raw_spin_lock_irqsave()
   rcu_report_qs_rdp()
   rcu_check_quiescent_state()
   rcu_core()
   rcu_core_si()
   handle_softirqs()
   __do_softirq()
   ____do_softirq()
   call_on_irq_stack()

Since the arm64 architecture uses IPI instead of true NMI to implement
arch_trigger_cpumask_backtrace(), spin_lock_irqsave disables
interrupts, which is enough to block this IPI request.
Therefore, if other CPUs start waiting for the lock before receiving
the IPI, a semi-deadlock scenario like the following occurs:

CPU0                    CPU1                    CPU2
-----                   -----                   -----
lock_irqsave(rnp->lock)
                        lock_irqsave(rnp->lock)
                        <can't receive IPI>
<send ipi to CPU 1>
<wait CPU 1 for 10s>
                                                lock_irqsave(rnp->lock)
                                                <can't receive IPI>
<send ipi to CPU 2>
<wait CPU 2 for 10s>
...


In our scenario, with 7 CPUs to dump, the lockup takes nearly 70
seconds, causing subsequent useful logs to be unable to print, leading
to a watchdog timeout and system reboot.

This series of changes re-acquires the lock after each dump,
significantly reducing lock-holding time. However, since it still holds
the lock while dumping CPU backtrace, there's still a chance for two
CPUs to wait for each other for 10 seconds, which is still too long.
So, I would like to ask if it's necessary to dump backtrace within the
spinlock section?
If not, especially now that lockless checks are possible, maybe it can
be changed as follows?

-			if (!(data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, cpu)))
-				continue;
-			raw_spin_lock_irqsave_rcu_node(rnp, flags);
-			if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
+			if (data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, cpu)) {
				if (cpu_is_offline(cpu))
					pr_err("Offline CPU %d blocking current GP.\n", cpu);
				else
					dump_cpu_task(cpu);
				}
			}
-			raw_spin_unlock_irqrestore_rcu_node(rnp,
flags);

Or should this be considered an arm64 issue, and they should switch to
true NMI, otherwise, they shouldn't use
nmi_trigger_cpumask_backtrace()?



  reply	other threads:[~2024-10-29  2:21 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 18:05 [PATCH rcu 0/3] RCU CPU stall-warning changes for v6.13 Paul E. McKenney
2024-10-09 18:05 ` [PATCH rcu 1/3] rcu: Delete unused rcu_gp_might_be_stalled() function Paul E. McKenney
2024-10-09 18:05 ` [PATCH rcu 2/3] rcu: Stop stall warning from dumping stacks if grace period ends Paul E. McKenney
2024-10-15 18:48   ` Joel Fernandes
2024-10-09 18:05 ` [PATCH rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks() Paul E. McKenney
2024-10-15 18:49 ` [PATCH rcu 0/3] RCU CPU stall-warning changes for v6.13 Joel Fernandes
2024-10-15 23:02   ` Paul E. McKenney
2024-10-16  0:01     ` Joel Fernandes
2024-10-16 16:18 ` [PATCH v2 " Paul E. McKenney
2024-10-16 16:19   ` [PATCH v2 rcu 1/3] rcu: Delete unused rcu_gp_might_be_stalled() function Paul E. McKenney
2024-10-16 16:19   ` [PATCH v2 rcu 2/3] rcu: Stop stall warning from dumping stacks if grace period ends Paul E. McKenney
2024-10-16 16:19   ` [PATCH v2 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks() Paul E. McKenney
2024-10-29  0:22     ` [PATCH v3 " Paul E. McKenney
2024-10-29  2:20       ` Cheng-Jui Wang (王正睿) [this message]
2024-10-29 16:29         ` Paul E. McKenney
2024-10-30  3:55           ` Cheng-Jui Wang (王正睿)
2024-10-30 13:54             ` Paul E. McKenney
2024-10-30 20:12               ` Doug Anderson
2024-10-30 23:26                 ` Paul E. McKenney
2024-10-31  0:21                   ` Doug Anderson
2024-10-31  5:03                     ` Paul E. McKenney
2024-10-31 21:27                       ` Doug Anderson
2024-11-01  1:44                         ` Cheng-Jui Wang (王正睿)
2024-11-01 13:55                           ` Paul E. McKenney
2025-10-30  8:30                             ` Tze-nan Wu (吳澤南)
2024-11-01  7:41               ` Cheng-Jui Wang (王正睿)
2024-11-01 13:59                 ` Paul E. McKenney
2024-10-18 21:49   ` [PATCH v2 rcu 0/3] RCU CPU stall-warning changes for v6.13 Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2b0bceacbd080e13b3aa7ad05569a787df4646d.camel@mediatek.com \
    --to=cheng-jui.wang@mediatek.com \
    --cc=bobule.chang@mediatek.com \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=wsd_upstream@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.