From: Michael Neuling <mikey@neuling.org>
To: Darren Hart <dvhltc@us.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
Gautham R Shenoy <ego@in.ibm.com>,
Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
linuxppc-dev@ozlabs.org, Will Schmidt <willschm@us.ibm.com>,
Paul Mackerras <paulus@samba.org>,
Ankita Garg <ankita@in.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
Date: Thu, 02 Sep 2010 11:02:02 +1000 [thread overview]
Message-ID: <11579.1283389322@neuling.org> (raw)
In-Reply-To: <4C7EBAA8.7030601@us.ibm.com>
In message <4C7EBAA8.7030601@us.ibm.com> you wrote:
> On 09/01/2010 12:59 PM, Steven Rostedt wrote:
> > On Wed, 2010-09-01 at 11:47 -0700, Darren Hart wrote:
> >
> >> from tip/rt/2.6.33 causes the preempt_count() to change across the cede
> >> call. This patch appears to prevents the proxy preempt_count assignment
> >> from happening. This non-local-cpu assignment to 0 would cause an
> >> underrun of preempt_count() if the local-cpu had disabled preemption
> >> prior to the assignment and then later tried to enable it. This appears
> >> to be the case with the stack of __trace_hcall* calls preceeding the
> >> return from extended_cede_processor() in the latency format trace-cmd
> >> report:
> >>
> >> <idle>-0 1d.... 201.252737: function: .cpu_die
> >
> > Note, the above 1d.... is a series of values. The first being the CPU,
> > the next if interrupts are disabled, the next if the NEED_RESCHED flag
> > is set, the next is softirqs enabled or disabled, next the
> > preempt_count, and finally the lockdepth count.
> >
> > Here we only care about the preempt_count, which is zero when '.' and a
> > number if it is something else. It is the second to last field in that
> > list.
> >
> >
> >> <idle>-0 1d.... 201.252738: function: .pseries_ma
ch_cpu_die
> >> <idle>-0 1d.... 201.252740: function: .idle_ta
sk_exit
> >> <idle>-0 1d.... 201.252741: function: .swit
ch_slb
> >> <idle>-0 1d.... 201.252742: function: .xics_te
ardown_cpu
> >> <idle>-0 1d.... 201.252743: function: .xics
_set_cpu_priority
> >> <idle>-0 1d.... 201.252744: function: .__trace_hcall
_entry
> >> <idle>-0 1d..1. 201.252745: function: .probe_hcal
l_entry
> >
> > ^
> > preempt_count set to 1
> >
> >> <idle>-0 1d..1. 201.252746: function: .__trace_hcall
_exit
> >> <idle>-0 1d..2. 201.252747: function: .probe_hcal
l_exit
> >> <idle>-0 1d.... 201.252748: function: .__trace_hcall
_entry
> >> <idle>-0 1d..1. 201.252748: function: .probe_hcal
l_entry
> >> <idle>-0 1d..1. 201.252750: function: .__trace_hcall
_exit
> >> <idle>-0 1d..2. 201.252751: function: .probe_hcal
l_exit
> >> <idle>-0 1d.... 201.252752: function: .__trace_hcall
_entry
> >> <idle>-0 1d..1. 201.252753: function: .probe_hcal
l_entry
> > ^ ^
> > CPU preempt_count
> >
> > Entering the function probe_hcall_entry() the preempt_count is 1 (see
> > below). But probe_hcall_entry does:
> >
> > h = &get_cpu_var(hcall_stats)[opcode / 4];
> >
> > Without doing the put (which it does in probe_hcall_exit())
> >
> > So exiting the probe_hcall_entry() the prempt_count is 2.
> > The trace_hcall_entry() will do a preempt_enable() making it leave as 1.
> >
> >
> >> offon.sh-3684 6..... 201.466488: bprint: .smp_pSeries_k
ick_cpu : resetting pcnt to 0 for cpu 1
> >
> > This is CPU 6, changing the preempt count from 1 to 0.
> >
> >>
> >> preempt_count() is reset from 1 to 0 by smp_startup_cpu() without the
> >> QCSS_NOT_STOPPED check from the patch above.
> >>
> >> <idle>-0 1d.... 201.466503: function: .__trace_hcall
_exit
> >
> > Note: __trace_hcall_exit() and __trace_hcall_entry() basically do:
> >
> > preempt_disable();
> > call probe();
> > preempt_enable();
> >
> >
> >> <idle>-0 1d..1. 201.466505: function: .probe_hcal
l_exit
> >
> > The preempt_count of 1 entering the probe_hcall_exit() is because of the
> > preempt_disable() shown above. It should have been entered as a 2.
> >
> > But then it does:
> >
> >
> > put_cpu_var(hcall_stats);
> >
> > making preempt_count 0.
> >
> > But the preempt_enable() in the trace_hcall_exit() causes this to be -1.
> >
> >
> >> <idle>-0 1d.Hff. 201.466507: bprint: .pseries_mach
_cpu_die : after cede: ffffffff
> >>
> >> With the preempt_count() being one less than it should be, the final
> >> preempt_enable() in the trace_hcall path drops preempt_count to
> >> 0xffffffff, which of course is an illegal value and leads to a crash.
> >
> > I'm confused to how this works in mainline?
>
> Turns out it didn't. 2.6.33.5 with CONFIG_PREEMPT=y sees this exact same
> behavior. The following, part of the 2.6.33.6 stable release, prevents
> this from happening:
>
> aef40e87d866355ffd279ab21021de733242d0d5
> powerpc/pseries: Only call start-cpu when a CPU is stopped
>
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -82,6 +82,12 @@ static inline int __devinit smp_startup_cpu(unsigned
> int lcpu)
>
> pcpu = get_hard_smp_processor_id(lcpu);
>
> + /* Check to see if the CPU out of FW already for kexec */
> + if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
> + cpu_set(lcpu, of_spin_map);
> + return 1;
> + }
> +
> /* Fixup atomic count: it exited inside IRQ handler. */
> task_thread_info(paca[lcpu].__current)->preempt_count = 0;
>
> The question is now, Is this the right fix? If so, perhaps we can update
> the comment to be a bit more clear and not refer solely to kexec.
>
> Michael Neuling, can you offer any thoughts here? We hit this EVERY
> TIME, which makes me wonder if the offline/online path could do this
> without calling smp_startup_cpu at all.
We need to call smp_startup_cpu on boot when we the cpus are still in
FW. smp_startup_cpu does this for us on boot.
I'm wondering if we just need to move the test down a bit to make sure
the preempt_count is set. I've not been following this thread, but
maybe this might work?
Untested patch below...
Mikey
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 0317cce..3afaba4 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -104,18 +104,18 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
pcpu = get_hard_smp_processor_id(lcpu);
- /* Check to see if the CPU out of FW already for kexec */
- if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
- cpumask_set_cpu(lcpu, of_spin_mask);
- return 1;
- }
-
/* Fixup atomic count: it exited inside IRQ handler. */
task_thread_info(paca[lcpu].__current)->preempt_count = 0;
if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
goto out;
+ /* Check to see if the CPU out of FW already for kexec */
+ if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
+ cpumask_set_cpu(lcpu, of_spin_mask);
+ return 1;
+ }
+
/*
* If the RTAS start-cpu token does not exist then presume the
* cpu is already spinning.
next prev parent reply other threads:[~2010-09-02 1:02 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-22 18:24 [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries Darren Hart
2010-07-22 18:36 ` Darren Hart
2010-07-22 18:38 ` Thomas Gleixner
2010-08-10 22:36 ` Darren Hart
2010-07-22 22:25 ` Benjamin Herrenschmidt
2010-07-22 23:57 ` Darren Hart
2010-07-23 4:44 ` Darren Hart
2010-07-23 5:08 ` Vaidyanathan Srinivasan
2010-07-23 5:11 ` Benjamin Herrenschmidt
2010-07-23 7:07 ` Vaidyanathan Srinivasan
2010-08-05 4:45 ` Darren Hart
2010-08-05 11:06 ` Vaidyanathan Srinivasan
2010-08-05 12:26 ` Thomas Gleixner
2010-07-23 5:09 ` Benjamin Herrenschmidt
2010-08-06 2:19 ` Darren Hart
2010-08-06 5:09 ` Vaidyanathan Srinivasan
2010-08-06 7:13 ` Darren Hart
2010-07-23 14:39 ` Will Schmidt
2010-08-04 13:44 ` Darren Hart
2010-08-19 15:58 ` Ankita Garg
2010-08-19 18:58 ` Will Schmidt
2010-08-23 22:20 ` Darren Hart
2010-08-31 7:12 ` Darren Hart
2010-09-01 5:54 ` Michael Ellerman
2010-09-01 15:10 ` Darren Hart
2010-09-01 18:47 ` Darren Hart
2010-09-01 19:59 ` Steven Rostedt
2010-09-01 20:42 ` Darren Hart
2010-09-02 1:02 ` Michael Neuling [this message]
2010-09-02 4:06 ` Steven Rostedt
2010-09-02 6:04 ` Darren Hart
2010-09-03 20:10 ` Will Schmidt
2010-09-02 23:04 ` Michael Neuling
2010-09-03 0:08 ` Darren Hart
2010-09-02 3:46 ` Michael Neuling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11579.1283389322@neuling.org \
--to=mikey@neuling.org \
--cc=ankita@in.ibm.com \
--cc=dvhltc@us.ibm.com \
--cc=ego@in.ibm.com \
--cc=josh@joshtriplett.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=paulus@samba.org \
--cc=rostedt@goodmis.org \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=willschm@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.