rcu stalls seen with numasched_v2 patches applied.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>,
	john stultz <johnstul@us.ibm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: rcu stalls seen with numasched_v2 patches applied.
Date: Tue, 7 Aug 2012 18:03:05 +0530	[thread overview]
Message-ID: <20120807123305.GA7137@linux.vnet.ibm.com> (raw)

Hi, 

I saw this while I was running the 2nd August -tip kernel +  Peter's
numasched patches.

Top showed load average to be 240, there was one cpu (cpu 7) which
showed 100% while all other cpus were idle.  The system showed some
sluggishness. Before I saw this I ran Andrea's autonuma benchmark couple
of times.

I am not sure if this is an already reported issue/known issue.

 INFO: rcu_sched self-detected stall on CPU { 7}  (t=105182911 jiffies)
 Pid: 5173, comm: qpidd Tainted: G        W    3.5.0numasched_v2_020812+ #1
 Call Trace:
 <IRQ>  [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
 [<ffffffff81060918>] update_process_times+0x48/0x90
 [<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
 [<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
 [<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
 [<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
 [<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
 [<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
 [<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
 <EOI>  [<ffffffff814e64b2>] ? _raw_spin_unlock_irqrestore+0x12/0x20
 [<ffffffff81082552>] sched_setnode+0x82/0xf0
 [<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
 [<ffffffff81070c6c>] task_work_run+0x6c/0x80
 [<ffffffff81013984>] do_notify_resume+0x94/0xa0
 [<ffffffff814e6a6c>] retint_signal+0x48/0x8c
 INFO: rcu_sched self-detected stall on CPU { 7}  (t=105362914 jiffies)
 Pid: 5173, comm: qpidd Tainted: G        W    3.5.0numasched_v2_020812+ #1
 Call Trace:
 <IRQ>  [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
 [<ffffffff81060918>] update_process_times+0x48/0x90
 [<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
 [<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
 [<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
 [<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
 [<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
 [<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
 [<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
 <EOI>  [<ffffffff81082562>] ? sched_setnode+0x92/0xf0
 [<ffffffff81082552>] ? sched_setnode+0x82/0xf0
 [<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
 [<ffffffff81070c6c>] task_work_run+0x6c/0x80
 [<ffffffff81013984>] do_notify_resume+0x94/0xa0
 [<ffffffff814e6a6c>] retint_signal+0x48/0x8c
 INFO: rcu_sched self-detected stall on CPU { 7}  (t=105542917 jiffies)
 Pid: 5173, comm: qpidd Tainted: G        W    3.5.0numasched_v2_020812+ #1
 Call Trace:
 <IRQ>  [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
 [<ffffffff81060918>] update_process_times+0x48/0x90
 [<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
 [<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
 [<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
 [<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
 [<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
 [<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
 [<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
 <EOI>  [<ffffffff814e64b2>] ? _raw_spin_unlock_irqrestore+0x12/0x20
 [<ffffffff81082552>] sched_setnode+0x82/0xf0
 [<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
 [<ffffffff81070c6c>] task_work_run+0x6c/0x80
 [<ffffffff81013984>] do_notify_resume+0x94/0xa0
 [<ffffffff814e6a6c>] retint_signal+0x48/0x8c


<these messages keep repeating>

I saw this on a 2 node 24 cpu machine. 

If I am able to reproduce this again, I plan to test these without the
numasched patches applied.

-- 
Thanks and Regards
Srikar Dronamraju

next             reply	other threads:[~2012-08-07 12:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-07 12:33 Srikar Dronamraju [this message]
2012-08-07 13:52 ` rcu stalls seen with numasched_v2 patches applied Peter Zijlstra
2012-08-07 17:19   ` Srikar Dronamraju
2012-08-08 19:58     ` Peter Zijlstra
2012-08-10 16:24       ` Srikar Dronamraju
2012-08-13  7:51         ` Peter Zijlstra
2012-08-13  8:11           ` Peter Zijlstra
2012-08-16 17:16             ` Srikar Dronamraju
2012-08-17  5:23           ` Srikar Dronamraju
2012-08-07 17:08 ` John Stultz
2012-08-07 16:59   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120807123305.GA7137@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.