* [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice @ 2023-07-05 7:30 Zhen Lei 2023-07-05 7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei 2023-07-05 7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei 0 siblings, 2 replies; 12+ messages in thread From: Zhen Lei @ 2023-07-05 7:30 UTC (permalink / raw) To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel Cc: Zhen Lei The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP kthread last ran is stalled, its stack does not need to be dumped again. For example: Please search "Sending NMI from CPU 1 to CPUs 0" rcu: INFO: rcu_sched self-detected stall on CPU rcu: 1-...!: (999 ticks this GP) idle=a1e4/1/0x40000002 softirq=116/116 fqs=0 rcu: (t=1000 jiffies g=-875 q=18 ncpus=4) rcu: rcu_sched kthread timer wakeup didn't happen for 999 jiffies! g-875 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=0 timer-softirq=449 rcu: rcu_sched kthread starved for 1000 jiffies! g-875 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_sched state:I stack:0 pid:12 ppid:2 flags:0x00000000 __schedule from schedule+0x50/0xa4 schedule from schedule_timeout+0x1f8/0x328 schedule_timeout from rcu_gp_fqs_loop+0x330/0x464 rcu_gp_fqs_loop from rcu_gp_kthread+0xb0/0x200 rcu_gp_kthread from kthread+0xe8/0x104 kthread from ret_from_fork+0x14/0x2c Exception stack(0xc0855fb0 to 0xc0855ff8) 5fa0: 00000000 00000000 00000000 00000000 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1+ #2 Hardware name: ARM-Versatile Express PC is at ktime_get+0x4c/0xe8 LR is at ktime_get+0x4c/0xe8 pc : [<801a61a4>] lr : [<801a61a4>] psr: 60000113 sp : 80d01e48 ip : 00000002 fp : 0000001a r10: 5befcd40 r9 : 431bde82 r8 : d7b634db r7 : 00001bb0 r6 : 9ad70c88 r5 : 00000002 r4 : 80db0f40 r3 : ffffffff r2 : f8a7b162 r1 : 00000000 r0 : 07584e9d Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6000406a DAC: 00000051 ktime_get from tst_softirq+0x30/0xfc tst_softirq from __do_softirq+0x128/0x334 __do_softirq from irq_exit+0x108/0x12c irq_exit from __irq_svc+0x88/0xb0 Exception stack(0x80d01f18 to 0x80d01f60) 1f00: 00490d54 00000001 1f20: 80d07fc0 00000000 80d9d260 80d04cd0 00000001 80d04d18 80c5ec18 00000000 1f40: 80d9bc35 80d07fc0 80d9cc80 80d01f68 808d6a6c 808d79f8 60000013 ffffffff __irq_svc from default_idle_call+0x4c/0xb4 default_idle_call from do_idle+0x1a8/0x288 do_idle from cpu_startup_entry+0x18/0x1c cpu_startup_entry from rest_init+0xb4/0xb8 rest_init from arch_post_acpi_subsys_init+0x0/0x8 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1+ #2 Hardware name: ARM-Versatile Express PC is at ktime_get+0x4c/0xe8 LR is at ktime_get+0x4c/0xe8 pc : [<801a61a4>] lr : [<801a61a4>] psr: 60000113 sp : 80d01e48 ip : 00000002 fp : 0000001a r10: 5befcd40 r9 : 431bde82 r8 : d7b634db r7 : 00001bb2 r6 : 9ad70c88 r5 : 00000002 r4 : 80db0f40 r3 : ffffffff r2 : f8a78d88 r1 : 00000000 r0 : 07587277 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6000406a DAC: 00000051 ktime_get from tst_softirq+0x30/0xfc tst_softirq from __do_softirq+0x128/0x334 __do_softirq from irq_exit+0x108/0x12c irq_exit from __irq_svc+0x88/0xb0 Exception stack(0x80d01f18 to 0x80d01f60) 1f00: 00490d54 00000001 1f20: 80d07fc0 00000000 80d9d260 80d04cd0 00000001 80d04d18 80c5ec18 00000000 1f40: 80d9bc35 80d07fc0 80d9cc80 80d01f68 808d6a6c 808d79f8 60000013 ffffffff __irq_svc from default_idle_call+0x4c/0xb4 default_idle_call from do_idle+0x1a8/0x288 do_idle from cpu_startup_entry+0x18/0x1c cpu_startup_entry from rest_init+0xb4/0xb8 rest_init from arch_post_acpi_subsys_init+0x0/0x8 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.0-rc1+ #2 Hardware name: ARM-Versatile Express PC is at default_idle_call+0x4c/0xb4 LR is at ct_kernel_enter.constprop.5+0x44/0x11c pc : [<808d79f8>] lr : [<808d6a6c>] psr: 60000013 sp : c085df98 ip : 80d9cc80 fp : 81126e80 r10: 80d9bc35 r9 : 00000000 r8 : 80c5ec18 r7 : 80d04d18 r6 : 00000002 r5 : 80d04cd0 r4 : 80d9d260 r3 : 00000000 r2 : 81126e80 r1 : 00000001 r0 : 0050a1e4 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6802806a DAC: 00000051 default_idle_call from do_idle+0x1a8/0x288 do_idle from cpu_startup_entry+0x18/0x1c cpu_startup_entry from secondary_start_kernel+0x14c/0x150 secondary_start_kernel from 0x60101660 Sending NMI from CPU 1 to CPUs 3: NMI backtrace for cpu 3 skipped: idling at default_idle_call+0x4c/0xb4 Zhen Lei (2): rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice kernel/rcu/tree_stall.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() 2023-07-05 7:30 [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei @ 2023-07-05 7:30 ` Zhen Lei 2023-07-10 19:03 ` Paul E. McKenney 2023-07-05 7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei 1 sibling, 1 reply; 12+ messages in thread From: Zhen Lei @ 2023-07-05 7:30 UTC (permalink / raw) To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel Cc: Zhen Lei The above condition "if (gpk)" already ensures that gp_kthread is created, so the local variable 'cpu' cannot be negative here. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> --- kernel/rcu/tree_stall.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void) pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); - if (cpu >= 0) { - if (cpu_is_offline(cpu)) { - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); - } else { - pr_err("Stack dump where RCU GP kthread last ran:\n"); - dump_cpu_task(cpu); - } + if (cpu_is_offline(cpu)) { + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); + } else { + pr_err("Stack dump where RCU GP kthread last ran:\n"); + dump_cpu_task(cpu); } wake_up_process(gpk); } -- 2.25.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() 2023-07-05 7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei @ 2023-07-10 19:03 ` Paul E. McKenney 2023-07-11 3:20 ` Leizhen (ThunderTown) 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2023-07-10 19:03 UTC (permalink / raw) To: Zhen Lei Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote: > The above condition "if (gpk)" already ensures that gp_kthread is created, > so the local variable 'cpu' cannot be negative here. > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > --- > kernel/rcu/tree_stall.h | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void) > pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); > pr_err("RCU grace-period kthread stack dump:\n"); > sched_show_task(gpk); > - if (cpu >= 0) { I am not quite this trusting of the relation between the relationship between the existence of the grace-period khread and its CPU number being in range. Let's please start with something like this: if (!WARN_ON_ONCE(cpu < 0)) { Please note that this is not just me. See for example the use of the cpumask_check() function, albeit the opposite concern. > - if (cpu_is_offline(cpu)) { > - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > - } else { > - pr_err("Stack dump where RCU GP kthread last ran:\n"); > - dump_cpu_task(cpu); > - } > + if (cpu_is_offline(cpu)) { > + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > + } else { > + pr_err("Stack dump where RCU GP kthread last ran:\n"); > + dump_cpu_task(cpu); > } > wake_up_process(gpk); > } > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() 2023-07-10 19:03 ` Paul E. McKenney @ 2023-07-11 3:20 ` Leizhen (ThunderTown) 2023-07-11 16:48 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Leizhen (ThunderTown) @ 2023-07-11 3:20 UTC (permalink / raw) To: paulmck Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On 2023/7/11 3:03, Paul E. McKenney wrote: > On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote: >> The above condition "if (gpk)" already ensures that gp_kthread is created, >> so the local variable 'cpu' cannot be negative here. >> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >> --- >> kernel/rcu/tree_stall.h | 12 +++++------- >> 1 file changed, 5 insertions(+), 7 deletions(-) >> >> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >> index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644 >> --- a/kernel/rcu/tree_stall.h >> +++ b/kernel/rcu/tree_stall.h >> @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void) >> pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); >> pr_err("RCU grace-period kthread stack dump:\n"); >> sched_show_task(gpk); >> - if (cpu >= 0) { > > I am not quite this trusting of the relation between the relationship > between the existence of the grace-period khread and its CPU number > being in range. Let's please start with something like this: > > if (!WARN_ON_ONCE(cpu < 0)) { > > Please note that this is not just me. See for example the use of the > cpumask_check() function, albeit the opposite concern. git grep -wn "\->cpu" kernel/ include/ kernel/kthread.c:583: to_kthread(p)->cpu = cpu; //kthread_create_on_cpu() kernel/sched/sched.h:2024: WRITE_ONCE(task_thread_info(p)->cpu, cpu); //__set_task_cpu() include/linux/sched.h:2250: return READ_ONCE(task_thread_info(p)->cpu); //task_cpu() git grep -wn "\.cpu" kernel/ include/ //There is no task related, the search result is omitted. Therefore, there is only one path "set_task_cpu()-->__set_task_cpu()" that can dynamically change the value of task_cpu(p). In fact, this guarantee has been made in set_task_cpu(). set_task_cpu WARN_ON_ONCE(!cpu_online(new_cpu)); __set_task_cpu(p, new_cpu); In addition, task_struct has member 'on_rq'. Therefore, when a task leaves the scheduling queue, setting the member 'cpu' to an invalid value will be thankless. Sorry, these two patches was posted too quickly, and I'm still regretting that I should have attached this to the commit description these days. > >> - if (cpu_is_offline(cpu)) { >> - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); >> - } else { >> - pr_err("Stack dump where RCU GP kthread last ran:\n"); >> - dump_cpu_task(cpu); >> - } >> + if (cpu_is_offline(cpu)) { >> + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); >> + } else { >> + pr_err("Stack dump where RCU GP kthread last ran:\n"); >> + dump_cpu_task(cpu); >> } >> wake_up_process(gpk); >> } >> -- >> 2.25.1 >> > . > -- Regards, Zhen Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() 2023-07-11 3:20 ` Leizhen (ThunderTown) @ 2023-07-11 16:48 ` Paul E. McKenney 2023-07-13 2:03 ` Leizhen (ThunderTown) 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2023-07-11 16:48 UTC (permalink / raw) To: Leizhen (ThunderTown) Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On Tue, Jul 11, 2023 at 11:20:07AM +0800, Leizhen (ThunderTown) wrote: > > > On 2023/7/11 3:03, Paul E. McKenney wrote: > > On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote: > >> The above condition "if (gpk)" already ensures that gp_kthread is created, > >> so the local variable 'cpu' cannot be negative here. > >> > >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > >> --- > >> kernel/rcu/tree_stall.h | 12 +++++------- > >> 1 file changed, 5 insertions(+), 7 deletions(-) > >> > >> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > >> index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644 > >> --- a/kernel/rcu/tree_stall.h > >> +++ b/kernel/rcu/tree_stall.h > >> @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void) > >> pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); > >> pr_err("RCU grace-period kthread stack dump:\n"); > >> sched_show_task(gpk); > >> - if (cpu >= 0) { > > > > I am not quite this trusting of the relation between the relationship > > between the existence of the grace-period khread and its CPU number > > being in range. Let's please start with something like this: > > > > if (!WARN_ON_ONCE(cpu < 0)) { > > > > Please note that this is not just me. See for example the use of the > > cpumask_check() function, albeit the opposite concern. > > git grep -wn "\->cpu" kernel/ include/ > kernel/kthread.c:583: to_kthread(p)->cpu = cpu; //kthread_create_on_cpu() > kernel/sched/sched.h:2024: WRITE_ONCE(task_thread_info(p)->cpu, cpu); //__set_task_cpu() > include/linux/sched.h:2250: return READ_ONCE(task_thread_info(p)->cpu); //task_cpu() > > git grep -wn "\.cpu" kernel/ include/ //There is no task related, the search result is omitted. > > Therefore, there is only one path "set_task_cpu()-->__set_task_cpu()" that can dynamically > change the value of task_cpu(p). In fact, this guarantee has been made in set_task_cpu(). > set_task_cpu > WARN_ON_ONCE(!cpu_online(new_cpu)); > __set_task_cpu(p, new_cpu); > > In addition, task_struct has member 'on_rq'. Therefore, when a task leaves the scheduling > queue, setting the member 'cpu' to an invalid value will be thankless. Thank you for digging into this! Given that, as you say, we can dispense with the check. > Sorry, these two patches was posted too quickly, and I'm still regretting that I should have > attached this to the commit description these days. Please do resend the patches with this explanation in the commit log. And please don't worry about making the English pretty, as I can always wordsmith. Thanx, Paul > >> - if (cpu_is_offline(cpu)) { > >> - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > >> - } else { > >> - pr_err("Stack dump where RCU GP kthread last ran:\n"); > >> - dump_cpu_task(cpu); > >> - } > >> + if (cpu_is_offline(cpu)) { > >> + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > >> + } else { > >> + pr_err("Stack dump where RCU GP kthread last ran:\n"); > >> + dump_cpu_task(cpu); > >> } > >> wake_up_process(gpk); > >> } > >> -- > >> 2.25.1 > >> > > . > > > > -- > Regards, > Zhen Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() 2023-07-11 16:48 ` Paul E. McKenney @ 2023-07-13 2:03 ` Leizhen (ThunderTown) 0 siblings, 0 replies; 12+ messages in thread From: Leizhen (ThunderTown) @ 2023-07-13 2:03 UTC (permalink / raw) To: paulmck Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On 2023/7/12 0:48, Paul E. McKenney wrote: > On Tue, Jul 11, 2023 at 11:20:07AM +0800, Leizhen (ThunderTown) wrote: >> >> >> On 2023/7/11 3:03, Paul E. McKenney wrote: >>> On Wed, Jul 05, 2023 at 03:30:19PM +0800, Zhen Lei wrote: >>>> The above condition "if (gpk)" already ensures that gp_kthread is created, >>>> so the local variable 'cpu' cannot be negative here. >>>> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >>>> --- >>>> kernel/rcu/tree_stall.h | 12 +++++------- >>>> 1 file changed, 5 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >>>> index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644 >>>> --- a/kernel/rcu/tree_stall.h >>>> +++ b/kernel/rcu/tree_stall.h >>>> @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void) >>>> pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); >>>> pr_err("RCU grace-period kthread stack dump:\n"); >>>> sched_show_task(gpk); >>>> - if (cpu >= 0) { >>> >>> I am not quite this trusting of the relation between the relationship >>> between the existence of the grace-period khread and its CPU number >>> being in range. Let's please start with something like this: >>> >>> if (!WARN_ON_ONCE(cpu < 0)) { >>> >>> Please note that this is not just me. See for example the use of the >>> cpumask_check() function, albeit the opposite concern. >> >> git grep -wn "\->cpu" kernel/ include/ >> kernel/kthread.c:583: to_kthread(p)->cpu = cpu; //kthread_create_on_cpu() >> kernel/sched/sched.h:2024: WRITE_ONCE(task_thread_info(p)->cpu, cpu); //__set_task_cpu() >> include/linux/sched.h:2250: return READ_ONCE(task_thread_info(p)->cpu); //task_cpu() >> >> git grep -wn "\.cpu" kernel/ include/ //There is no task related, the search result is omitted. >> >> Therefore, there is only one path "set_task_cpu()-->__set_task_cpu()" that can dynamically >> change the value of task_cpu(p). In fact, this guarantee has been made in set_task_cpu(). >> set_task_cpu >> WARN_ON_ONCE(!cpu_online(new_cpu)); >> __set_task_cpu(p, new_cpu); >> >> In addition, task_struct has member 'on_rq'. Therefore, when a task leaves the scheduling >> queue, setting the member 'cpu' to an invalid value will be thankless. > > Thank you for digging into this! Given that, as you say, we can dispense > with the check. > >> Sorry, these two patches was posted too quickly, and I'm still regretting that I should have >> attached this to the commit description these days. > > Please do resend the patches with this explanation in the commit log. > And please don't worry about making the English pretty, as I can always > wordsmith. OK, thank you very much. > > Thanx, Paul > >>>> - if (cpu_is_offline(cpu)) { >>>> - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); >>>> - } else { >>>> - pr_err("Stack dump where RCU GP kthread last ran:\n"); >>>> - dump_cpu_task(cpu); >>>> - } >>>> + if (cpu_is_offline(cpu)) { >>>> + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); >>>> + } else { >>>> + pr_err("Stack dump where RCU GP kthread last ran:\n"); >>>> + dump_cpu_task(cpu); >>>> } >>>> wake_up_process(gpk); >>>> } >>>> -- >>>> 2.25.1 >>>> >>> . >>> >> >> -- >> Regards, >> Zhen Lei > . > -- Regards, Zhen Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice 2023-07-05 7:30 [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei 2023-07-05 7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei @ 2023-07-05 7:30 ` Zhen Lei 2023-07-05 8:17 ` Leizhen (ThunderTown) 2023-07-10 19:05 ` Paul E. McKenney 1 sibling, 2 replies; 12+ messages in thread From: Zhen Lei @ 2023-07-05 7:30 UTC (permalink / raw) To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel Cc: Zhen Lei The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP kthread last ran is stalled, its stack does not need to be dumped again. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> --- kernel/rcu/tree_stall.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index dcfaa3d5db2cbc7..cc884cd49e026a3 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) data_race(READ_ONCE(rcu_state.gp_state)), gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); if (gpk) { + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); if (cpu_is_offline(cpu)) { pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); - } else { + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { pr_err("Stack dump where RCU GP kthread last ran:\n"); dump_cpu_task(cpu); } -- 2.25.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice 2023-07-05 7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei @ 2023-07-05 8:17 ` Leizhen (ThunderTown) 2023-07-10 19:05 ` Paul E. McKenney 1 sibling, 0 replies; 12+ messages in thread From: Leizhen (ThunderTown) @ 2023-07-05 8:17 UTC (permalink / raw) To: Paul E . McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On 2023/7/5 15:30, Zhen Lei wrote: > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > kthread last ran is stalled, its stack does not need to be dumped again. Should I add Fixes? Fixes: 243027a3c805 ("rcu: For RCU grace-period kthread starvation, dump last CPU it ran on") > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > --- > kernel/rcu/tree_stall.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index dcfaa3d5db2cbc7..cc884cd49e026a3 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) > data_race(READ_ONCE(rcu_state.gp_state)), > gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); > if (gpk) { > + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > + > pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); > pr_err("RCU grace-period kthread stack dump:\n"); > sched_show_task(gpk); > if (cpu_is_offline(cpu)) { > pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > - } else { > + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { > pr_err("Stack dump where RCU GP kthread last ran:\n"); > dump_cpu_task(cpu); > } > -- Regards, Zhen Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice 2023-07-05 7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei 2023-07-05 8:17 ` Leizhen (ThunderTown) @ 2023-07-10 19:05 ` Paul E. McKenney 2023-07-10 19:55 ` Joel Fernandes 1 sibling, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2023-07-10 19:05 UTC (permalink / raw) To: Zhen Lei Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > kthread last ran is stalled, its stack does not need to be dumped again. > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> This one looks good. Please feel free to rebase it before 1/2 and repost. Thanx, Paul > --- > kernel/rcu/tree_stall.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index dcfaa3d5db2cbc7..cc884cd49e026a3 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) > data_race(READ_ONCE(rcu_state.gp_state)), > gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); > if (gpk) { > + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > + > pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); > pr_err("RCU grace-period kthread stack dump:\n"); > sched_show_task(gpk); > if (cpu_is_offline(cpu)) { > pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); > - } else { > + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { > pr_err("Stack dump where RCU GP kthread last ran:\n"); > dump_cpu_task(cpu); > } > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice 2023-07-10 19:05 ` Paul E. McKenney @ 2023-07-10 19:55 ` Joel Fernandes 2023-07-10 20:32 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Joel Fernandes @ 2023-07-10 19:55 UTC (permalink / raw) To: paulmck Cc: Zhen Lei, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: > > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > > kthread last ran is stalled, its stack does not need to be dumped again. > > > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > > This one looks good. Please feel free to rebase it before 1/2 and repost. Just a small comment: I wondered if this would make it harder to identify which stack among the various CPU stacks corresponds to the one the GP kthread is running on. However, this line does print the CPU number of the thread, so it is perhaps not an issue: pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#x ->cpu=%d\n", Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice 2023-07-10 19:55 ` Joel Fernandes @ 2023-07-10 20:32 ` Paul E. McKenney 2023-07-11 3:26 ` Leizhen (ThunderTown) 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2023-07-10 20:32 UTC (permalink / raw) To: Joel Fernandes Cc: Zhen Lei, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote: > On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: > > > The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP > > > kthread last ran is stalled, its stack does not need to be dumped again. > > > > > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > > > > This one looks good. Please feel free to rebase it before 1/2 and repost. > > Just a small comment: > I wondered if this would make it harder to identify which stack among > the various CPU stacks corresponds to the one the GP kthread is > running on. However, this line does print the CPU number of the > thread, so it is perhaps not an issue: > > pr_err("%s kthread starved for %ld jiffies! g%ld f%#x > %s(%d) ->state=%#x ->cpu=%d\n", > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Thank you! Zhen Lei, please feel free to add Joel's Reviewed-by on your next posting. Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice 2023-07-10 20:32 ` Paul E. McKenney @ 2023-07-11 3:26 ` Leizhen (ThunderTown) 0 siblings, 0 replies; 12+ messages in thread From: Leizhen (ThunderTown) @ 2023-07-11 3:26 UTC (permalink / raw) To: paulmck, Joel Fernandes Cc: Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, linux-kernel On 2023/7/11 4:32, Paul E. McKenney wrote: > On Mon, Jul 10, 2023 at 03:55:16PM -0400, Joel Fernandes wrote: >> On Mon, Jul 10, 2023 at 3:06 PM Paul E. McKenney <paulmck@kernel.org> wrote: >>> >>> On Wed, Jul 05, 2023 at 03:30:20PM +0800, Zhen Lei wrote: >>>> The stacks of all stalled CPUs will be dumped. If the CPU on where RCU GP >>>> kthread last ran is stalled, its stack does not need to be dumped again. >>>> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> >>> >>> This one looks good. Please feel free to rebase it before 1/2 and repost. >> >> Just a small comment: >> I wondered if this would make it harder to identify which stack among >> the various CPU stacks corresponds to the one the GP kthread is >> running on. However, this line does print the CPU number of the >> thread, so it is perhaps not an issue: >> >> pr_err("%s kthread starved for %ld jiffies! g%ld f%#x >> %s(%d) ->state=%#x ->cpu=%d\n", >> >> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Thank you! Zhen Lei, please feel free to add Joel's Reviewed-by on your > next posting. OK > > Thanx, Paul > . > -- Regards, Zhen Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-07-13 2:03 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-05 7:30 [PATCH 0/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei 2023-07-05 7:30 ` [PATCH 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Zhen Lei 2023-07-10 19:03 ` Paul E. McKenney 2023-07-11 3:20 ` Leizhen (ThunderTown) 2023-07-11 16:48 ` Paul E. McKenney 2023-07-13 2:03 ` Leizhen (ThunderTown) 2023-07-05 7:30 ` [PATCH 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Zhen Lei 2023-07-05 8:17 ` Leizhen (ThunderTown) 2023-07-10 19:05 ` Paul E. McKenney 2023-07-10 19:55 ` Joel Fernandes 2023-07-10 20:32 ` Paul E. McKenney 2023-07-11 3:26 ` Leizhen (ThunderTown)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox