From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: catalin.marinas@arm.com, dennis.chen@arm.com,
jiangshanlai@gmail.com, josh@joshtriplett.org,
linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
rostedt@goodmis.org, steve.capper@arm.com, will.deacon@arm.com
Subject: Re: [PATCHv2] rcu: tree: correctly handle sparse possible CPUs
Date: Tue, 17 May 2016 12:01:06 -0700 [thread overview]
Message-ID: <20160517190106.GJ3528@linux.vnet.ibm.com> (raw)
In-Reply-To: <1463480530-5674-1-git-send-email-mark.rutland@arm.com>
On Tue, May 17, 2016 at 11:22:10AM +0100, Mark Rutland wrote:
> In many cases in the RCU tree code, we iterate over the set of CPUs for
> a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
> per-cpu data for each CPU in this range. However, if the set of possible
> CPUs is sparse, some CPUs described in this range are not possible, and
> thus no per-cpu region will have been allocated (or initialised) for
> them by the generic percpu code.
>
> Erroneous accesses to a per-cpu area for these !possible CPUs may fault
> or may hit other data depending on the addressed generated when the
> erroneous per cpu offset is applied. In practice, both cases have been
> observed on arm64 hardware (the former being silent, but detectable with
> additional patches).
>
> To avoid issues resulting from this, we must iterate over the set of
> *possible* cpus for a given leaf node. This patch adds new helpers to
> enable this (also unifying and simplifying some related bitmask
> manipulation logic), and moves the RCU tree code over to them.
>
> Without this patch, running reboot at a shell can result in an oops
> like:
Very good, this one applies cleanly and I have queued it for review
and testing.
One question below, though.
Thanx, Paul
> [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
> [ 3369.083881] pgd = ffffffc3ecdda000
> [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
> [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
> [ 3369.101781] Modules linked in:
> [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3
> [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
> [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
> [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
> [ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5
> [ 3369.146860] sp : ffffffc3eb9435a0
> [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
> [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
> [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
> [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
> [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
> [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
> [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
> [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
> [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
> [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
> [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
> [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
> [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
> [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
> [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
> ...
> [ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510
> [ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8
> [ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30
> [ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298
> [ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0
> [ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40
> [ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158
> [ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60
> [ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918
> [ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8
> [ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220
> [ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8
> [ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40
> [ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238
> [ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358
> [ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30
> [ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280
> [ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88
> [ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20
> [ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Reported-by: Dennis Chen <dennis.chen@arm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: linux-kernel@vger.kernel.org
> ---
> kernel/rcu/tree.c | 19 +++++++++----------
> kernel/rcu/tree.h | 18 ++++++++++++++++++
> kernel/rcu/tree_exp.h | 12 ++++--------
> kernel/rcu/tree_plugin.h | 5 +++--
> 4 files changed, 34 insertions(+), 20 deletions(-)
>
> Since v1 [1]:
> * rebase to the -rcu rcu/dev branch.
> * replace all occurences missed by v1.
> * s/CPUS/CPUs/ in commit message. Gah.
>
> Paul, I've given this a spin on arm64 and I've build-tested for x86.
> Things look fine, though this hasn't seen a thorough beating.
>
> Mark.
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index afdcb7b..65ee19e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1279,15 +1279,16 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
> static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
> {
> int cpu;
> + unsigned long bit;
> unsigned long flags;
> struct rcu_node *rnp;
>
> rcu_for_each_leaf_node(rsp, rnp) {
> raw_spin_lock_irqsave_rcu_node(rnp, flags);
> if (rnp->qsmask != 0) {
> - for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
> - if (rnp->qsmask & (1UL << cpu))
> - dump_cpu_task(rnp->grplo + cpu);
> + for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit)
> + if (rnp->qsmask & bit)
> + dump_cpu_task(cpu);
> }
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }
> @@ -1316,6 +1317,7 @@ static void rcu_stall_kick_kthreads(struct rcu_state *rsp)
> static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
> {
> int cpu;
> + unsigned long bit;
> long delta;
> unsigned long flags;
> unsigned long gpa;
> @@ -1353,10 +1355,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
> raw_spin_lock_irqsave_rcu_node(rnp, flags);
> ndetected += rcu_print_task_stall(rnp);
> if (rnp->qsmask != 0) {
> - for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
> - if (rnp->qsmask & (1UL << cpu)) {
> - print_cpu_stall_info(rsp,
> - rnp->grplo + cpu);
> + for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit)
> + if (rnp->qsmask & bit) {
> + print_cpu_stall_info(rsp, cpu);
> ndetected++;
> }
> }
> @@ -2908,9 +2909,7 @@ static void force_qs_rnp(struct rcu_state *rsp,
> continue;
> }
> }
> - cpu = rnp->grplo;
> - bit = 1;
> - for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
> + for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) {
> if ((rnp->qsmask & bit) != 0) {
> if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> mask |= bit;
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index e3959f5..f73444f 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -281,6 +281,24 @@ struct rcu_node {
> (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
>
> /*
> + * Iterate over all possible CPUs in a leaf RCU node.
> + */
> +#define for_each_leaf_node_possible_cpu(rnp, cpu) \
> + for ((cpu) = rnp->grplo; \
> + cpu <= rnp->grphi; \
> + cpu = cpumask_next((cpu), cpu_possible_mask))
What if the rnp->grplo corresponds to a non-existent CPU? I admit that
this is perhaps unlikely given the usual power-of-two defaults and
pwoer-of-two computer-system structure, but...
Would something like this handle that possibility?
+#define for_each_leaf_node_possible_cpu(rnp, cpu) \
+ for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
+ cpu <= rnp->grphi; \
+ cpu = cpumask_next((cpu), cpu_possible_mask))
Or maybe like this, with less duplicated code but very strange style:
+#define for_each_leaf_node_possible_cpu(rnp, cpu) \
+ for ((cpu) = rnp->grplo - 1; \
+ cpu = cpumask_next((cpu), cpu_possible_mask), cpu <= rnp->grphi; 1)
The first one is probably far better, assuming that it works, but I could
not resist inflicting the second one on you. ;-)
> +
> +/*
> + * Iterate over all possible CPUs in a leaf RCU node, at each step providing a
> + * bit for comparison against rcu_node bitmasks.
> + */
> +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \
> + for ((cpu) = rnp->grplo, (bit) = 1; \
> + cpu <= rnp->grphi; \
> + cpu = cpumask_next((cpu), cpu_possible_mask), \
> + (bit) = 1UL << (cpu - rnp->grplo))
Same question here.
> +
> +/*
> * Union to allow "aggregate OR" operation on the need for a quiescent
> * state by the normal and expedited grace periods.
> */
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 00a02a2..6655a44 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -356,7 +356,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>
> /* Each pass checks a CPU for identity, offline, and idle. */
> mask_ofl_test = 0;
> - for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
> + for_each_leaf_node_possible_cpu(rnp, cpu) {
> struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
>
> @@ -376,8 +376,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>
> /* IPI the remaining CPUs for expedited quiescent state. */
> - mask = 1;
> - for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> + for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {
> if (!(mask_ofl_ipi & mask))
> continue;
> retry_ipi:
> @@ -440,8 +439,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> ndetected = 0;
> rcu_for_each_leaf_node(rsp, rnp) {
> ndetected += rcu_print_task_exp_stall(rnp);
> - mask = 1;
> - for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> + for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {
> struct rcu_data *rdp;
>
> if (!(rnp->expmask & mask))
> @@ -453,7 +451,6 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> "o."[!!(rdp->grpmask & rnp->expmaskinit)],
> "N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
> }
> - mask <<= 1;
> }
> pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
> jiffies - jiffies_start, rsp->expedited_sequence,
> @@ -473,8 +470,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
> pr_cont("\n");
> }
> rcu_for_each_leaf_node(rsp, rnp) {
> - mask = 1;
> - for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> + for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {
> if (!(rnp->expmask & mask))
> continue;
> dump_cpu_task(cpu);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 02a9197..fd6b0f7 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1157,6 +1157,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> {
> struct task_struct *t = rnp->boost_kthread_task;
> unsigned long mask = rcu_rnp_online_cpus(rnp);
> + unsigned long bit;
> cpumask_var_t cm;
> int cpu;
>
> @@ -1164,8 +1165,8 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> return;
> if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
> return;
> - for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1)
> - if ((mask & 0x1) && cpu != outgoingcpu)
> + for_each_leaf_node_cpu_bit(rnp, cpu, bit)
> + if ((mask & bit) && cpu != outgoingcpu)
> cpumask_set_cpu(cpu, cm);
> if (cpumask_weight(cm) == 0)
> cpumask_setall(cm);
> --
> 1.9.1
>
next prev parent reply other threads:[~2016-05-17 19:01 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-16 16:48 [PATCH] rcu: tree: correctly handle sparse possible CPUs Mark Rutland
2016-05-16 19:19 ` Paul E. McKenney
2016-05-17 10:22 ` [PATCHv2] " Mark Rutland
2016-05-17 19:01 ` Paul E. McKenney [this message]
2016-05-13 1:27 ` Mark Rutland
2016-05-18 0:12 ` Paul E. McKenney
2016-05-18 12:02 ` Arnd Bergmann
2016-05-18 18:15 ` Mark Rutland
2016-05-18 18:41 ` Paul E. McKenney
2016-05-20 10:30 ` kbuild test robot
2016-05-18 15:15 ` [PATCH] " Andrey Ryabinin
2016-05-18 18:01 ` Paul E. McKenney
2016-05-18 18:30 ` Mark Rutland
2016-05-18 18:44 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160517190106.GJ3528@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=catalin.marinas@arm.com \
--cc=dennis.chen@arm.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
--cc=steve.capper@arm.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.