Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Darren Hart <dvhltc@us.ibm.com>
To: Ankita Garg <ankita@in.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Gautham R Shenoy <ego@in.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	linuxppc-dev@ozlabs.org, Will Schmidt <willschm@us.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
Date: Tue, 31 Aug 2010 00:12:50 -0700	[thread overview]
Message-ID: <4C7CAB72.2050305@us.ibm.com> (raw)
In-Reply-To: <20100819155824.GD2690@in.ibm.com>

On 08/19/2010 08:58 AM, Ankita Garg wrote:
> Hi Darren,
> 
> On Thu, Jul 22, 2010 at 11:24:13AM -0700, Darren Hart wrote:
>>
>> With some instrumentation we were able to determine that the
>> preempt_count() appears to change across the extended_cede_processor()
>> call.  Specifically across the plpar_hcall_norets(H_CEDE) call. On
>> PREEMPT_RT we call this with preempt_count=1 and return with
>> preempt_count=0xffffffff. On mainline with CONFIG_PREEMPT=y, the value
>> is different (0x65) but is still incorrect.
> 
> I was trying to reproduce this issue on a 2.6.33.7-rt29 kernel. I could
> easily reproduce this on the RT kernel and not the non-RT kernel.

This matches my experience.

> However, I hit it every single time I did a cpu online operation. I also
> noticed that the issue persists even when I disable H_CEDE by passing
> the "cede_offline=0" kernel commandline parameter. Could you pl confirm
> if you observe the same in your setup ?

Yes, this is my experience as well.

> 
> However, the issue still remains. Will spend few cycles looking into
> this issue.
> 

I've spent today trying to collect some useful traces. I consistently
see the preempt_count() change to 0xffffffff across the H_CEDE call, but
the irq and sched events (to ftrace) do not indicate anything else
running on the CPU that could change that.

  <idle>-0       1d.h1. 11408us : irq_handler_entry: irq=16 name=IPI
  <idle>-0       1d.h1. 11411us : irq_handler_exit: irq=16 ret=handled
  ...            <other cpus>
  <idle>-0       1d.... 22101us : .pseries_mach_cpu_die: start
  <idle>-0       1d.... 22108us : .pseries_mach_cpu_die: cpu 1 (hwid 1) ceding for offline with hint 2
  ...            <other cpus>
  <idle>-0       1d.Hff. 33804us : .pseries_mach_cpu_die: returned from cede with pcnt: ffffffff
  <idle>-0       1d.Hff. 33805us : .pseries_mach_cpu_die: forcing pcnt to 0
  <idle>-0       1d.... 33807us : .pseries_mach_cpu_die: cpu 1 (hwid 1) returned from cede.
  <idle>-0       1d.... 33808us : .pseries_mach_cpu_die: Decrementer value = 7fa49474 Timebase value = baefee6c88113
  <idle>-0       1d.... 33809us : .pseries_mach_cpu_die: cpu 1 (hwid 1) got prodded to go online
  <idle>-0       1d.... 33816us : .pseries_mach_cpu_die: calling start_seconday, pcnt: 0
  <idle>-0       1d.... 33816us : .pseries_mach_cpu_die: forcing pcnt to 0

---------------------------------
Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan]
NIP: c00000000006aa50 LR: c00000000006ac40 CTR: c00000000006ac14
REGS: c00000008e79ffc0 TRAP: 0300   Not tainted  (2.6.33-rt-dvhrt16-02358-g61223dd-dirty)
MSR: 8000000000001032 <ME,IR,DR>  CR: 28000022  XER: 00000000
DAR: c000018000ba44c0, DSISR: 0000000040000000
TASK = c00000010e6de040[0] 'swapper' THREAD: c00000008e7a0000 CPU: 1
GPR00: 0000018000000040 c00000008e7a0240 c000000000b55008 c00000010e6de040
GPR04: c000000085d800c0 0000000000000000 0000000000000000 000000000000000f
GPR08: 0000018000000000 c00000008e7a0000 c000000000ba4480 c000000000a32c80
GPR12: 8000000000009032 c000000000ba4680 c00000008e7a08f0 0000000000000001
GPR16: fffffffffffff2fa c00000010e6de040 0000000000000000 7fffffffffffffff
GPR20: 0000000000000000 0000000000000001 0000000000000001 c000000000f42c80
GPR24: c0000000850b7638 0000000000000000 0000000000000000 0000000000000001
GPR28: c000000000f42c80 c00000010e6de040 c000000000ad7698 c00000008e7a0240
NIP [c00000000006aa50] .resched_task+0x48/0xd8
LR [c00000000006ac40] .check_preempt_curr_idle+0x2c/0x44
Call Trace:
Instruction dump:
f821ff71 7c3f0b78 ebc2ab28 7c7d1b78 60000000 60000000 e95e8008 e97e8000
e93d0008 81090010 79084da4 38080040 <7d4a002a> 7c0b502e 7c000074 7800d182
---[ end trace 6d3f1cddaa17382c ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Call Trace:
Rebooting in 180 seconds..



When running with the function plugin I had to stop the trace
immediately before entering start_secondary after an online or my traces
would not include the pseries_mach_cpu_die function, nor the tracing I
added there (possibly buffer size, I am using 2048). The following trace
was collected using "trace-cmd record -p function -e irq -e sched" and
has been filtered to only show CPU [001] (the CPU undergoing the
offline/online test, and the one seeing preempt_count (pcnt) go to
ffffffff after cede. The function tracer does not indicate anything
running on the CPU other than the HCALL - unless the __trace_hcall*
commands might be to blame. Can a POWER guru comment?

          <idle>-0     [001]   417.625286: function:             .cpu_die
          <idle>-0     [001]   417.625287: function:                .pseries_mach_cpu_die
          <idle>-0     [001]   417.625290: bprint:               .pseries_mach_cpu_die : start
          <idle>-0     [001]   417.625291: function:                   .idle_task_exit
          <idle>-0     [001]   417.625292: function:                      .switch_slb
          <idle>-0     [001]   417.625294: function:                   .xics_teardown_cpu
          <idle>-0     [001]   417.625294: function:                      .xics_set_cpu_priority
          <idle>-0     [001]   417.625295: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.625296: function:                .probe_hcall_entry
          <idle>-0     [001]   417.625297: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.625298: function:                .probe_hcall_exit
          <idle>-0     [001]   417.625299: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.625300: function:                .probe_hcall_entry
          <idle>-0     [001]   417.625301: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.625302: function:                .probe_hcall_exit
          <idle>-0     [001]   417.625305: bprint:               .pseries_mach_cpu_die : cpu 1 (hwid 1) ceding for offline with hint 2
          <idle>-0     [001]   417.625307: bprint:               .pseries_mach_cpu_die : calling extended cede, pcnt: 0
          <idle>-0     [001]   417.625308: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.625308: function:                .probe_hcall_entry
          <idle>-0     [001]   417.837734: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.837736: function:                .probe_hcall_exit
          <idle>-0     [001]   417.837740: bprint:               .pseries_mach_cpu_die : returned from cede with pcnt: ffffffff
          <idle>-0     [001]   417.837742: bprint:               .pseries_mach_cpu_die : forcing pcnt to 0
          <idle>-0     [001]   417.837743: bprint:               .pseries_mach_cpu_die : cpu 1 (hwid 1) returned from cede.
          <idle>-0     [001]   417.837745: bprint:               .pseries_mach_cpu_die : Decrementer value = 79844cbf Timebase value = bb3cf7ab2771c
          <idle>-0     [001]   417.837747: bprint:               .pseries_mach_cpu_die : cpu 1 (hwid 1) got prodded to go online
          <idle>-0     [001]   417.837749: function:             .__trace_hcall_entry
          <idle>-0     [001]   417.837749: function:                .probe_hcall_entry
          <idle>-0     [001]   417.837755: function:             .__trace_hcall_exit
          <idle>-0     [001]   417.837755: function:                .probe_hcall_exit
          <idle>-0     [001]   417.837757: bprint:               .pseries_mach_cpu_die : calling start_seconday, pcnt: 0
          <idle>-0     [001]   417.837758: bprint:               .pseries_mach_cpu_die : forcing pcnt to 0



I replaced the extended_cede_processor() call with mdelay(2). When I do
that, I don't see the preempt_count() change across the mdelay(2) call.
However, the system still eventually crashes in sub_preempt_count().
This appears to be independent of if preempt_count changes across
CEDE (since it doesn't occur across mdelay). I will try with larger
values of mdelay tomorrow to see if a longer delay will experience the
preempt_count value change.

Badness at kernel/sched.c:3726
NIP: c000000000698684 LR: c000000000698668 CTR: c00000000007a89c
REGS: c00000008e7a0170 TRAP: 0700   Not tainted  (2.6.33-rt-dvhrt16-02358-g61223dd-dirty)
MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 28000022  XER: 00000000
TASK = c00000010e6de040[0] 'swapper' THREAD: c00000008e7a0000 CPU: 1
GPR00: 0000000000000000 c00000008e7a03f0 c000000000b55008 0000000000000001
GPR04: 0000000000000000 0000000000000000 0000000000000000 000000000000000f
GPR08: 0000000000000200 c000000000ca5420 0000000000000001 c000000000bc22f0
GPR12: 8000000000009032 c000000000ba4680 c00000008e7a0a10 0000000000000001
GPR16: fffffffffffff4d1 c00000010e6de040 0000000000000000 7fffffffffffffff
GPR20: 0000000000000000 0000000000000001 0000000000000001 c000000000f42c80
GPR24: 0000000000000001 0000000000000000 0000000000000000 0000000000000001
GPR28: c000000000f42c80 0000000000000001 c000000000ad7698 c00000008e7a03f0
NIP [c000000000698684] .sub_preempt_count+0xa8/0xdc
LR [c000000000698668] .sub_preempt_count+0x8c/0xdc
Call Trace:
Instruction dump:
41fd0038 78000620 2fa00000 40fe002c 4bd08a61 60000000 2fa30000 419e002c
e93e87e8 80090000 2f800000 409e001c <0fe00000> 48000014 78290464 80090014
Unable to handle kernel paging request for data at address 0xc000018000ba44c0
Faulting instruction address: 0xc00000000006aa1c
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=128 NUMA pSeries
last sysfs file:line



-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

next prev parent reply	other threads:[~2010-08-31  7:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-22 18:24 [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries Darren Hart
2010-07-22 18:36 ` Darren Hart
2010-07-22 18:38 ` Thomas Gleixner
2010-08-10 22:36   ` Darren Hart
2010-07-22 22:25 ` Benjamin Herrenschmidt
2010-07-22 23:57   ` Darren Hart
2010-07-23  4:44     ` Darren Hart
2010-07-23  5:08       ` Vaidyanathan Srinivasan
2010-07-23  5:11         ` Benjamin Herrenschmidt
2010-07-23  7:07           ` Vaidyanathan Srinivasan
2010-08-05  4:45             ` Darren Hart
2010-08-05 11:06               ` Vaidyanathan Srinivasan
2010-08-05 12:26                 ` Thomas Gleixner
2010-07-23  5:09       ` Benjamin Herrenschmidt
2010-08-06  2:19         ` Darren Hart
2010-08-06  5:09           ` Vaidyanathan Srinivasan
2010-08-06  7:13             ` Darren Hart
2010-07-23 14:39     ` Will Schmidt
2010-08-04 13:44   ` Darren Hart
2010-08-19 15:58 ` Ankita Garg
2010-08-19 18:58   ` Will Schmidt
2010-08-23 22:20   ` Darren Hart
2010-08-31  7:12   ` Darren Hart [this message]
2010-09-01  5:54     ` Michael Ellerman
2010-09-01 15:10       ` Darren Hart
2010-09-01 18:47         ` Darren Hart
2010-09-01 19:59           ` Steven Rostedt
2010-09-01 20:42             ` Darren Hart
2010-09-02  1:02               ` Michael Neuling
2010-09-02  4:06                 ` Steven Rostedt
2010-09-02  6:04                   ` Darren Hart
2010-09-03 20:10                     ` Will Schmidt
2010-09-02 23:04                   ` Michael Neuling
2010-09-03  0:08                     ` Darren Hart
2010-09-02  3:46           ` Michael Neuling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C7CAB72.2050305@us.ibm.com \
    --to=dvhltc@us.ibm.com \
    --cc=ankita@in.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=josh@joshtriplett.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=sfr@canb.auug.org.au \
    --cc=tglx@linutronix.de \
    --cc=willschm@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.