All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linaro-sched-sig@lists.linaro.org,
	Alessio Igor Bogani <abogani@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Avi Kivity <avi@redhat.com>, Chris Metcalf <cmetcalf@tilera.com>,
	Christoph Lameter <cl@linux.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Geoff Levand <geoff@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Max Krasnyansky <maxk@qualcomm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sven-Thorsten Dietrich <thebigcorporation@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Zen Lin <zen@openhuawei.org>
Subject: Re: [RFC][PATCH 00/32] Nohz cpusets v2 (adaptive tickless kernel)
Date: Wed, 28 Mar 2012 13:43:27 +0200	[thread overview]
Message-ID: <20120328114323.GB17189@somewhere.redhat.com> (raw)
In-Reply-To: <CAOtvUMf0Zp9C-jT=ymEo6oyxnsgkvSiGDUs5V_56v4Ai2QyBJQ@mail.gmail.com>

On Tue, Mar 27, 2012 at 05:02:34PM +0200, Gilad Ben-Yossef wrote:
> On Wed, Mar 21, 2012 at 3:58 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > Hi all,
> >
> > A summary of what this is about can be found here:
> >  https://lkml.org/lkml/2011/8/15/245
> >
> > There are still a lot of things to handle. Especially about
> > what is done by scheduler_tick() but we also need to:
> >
> > - completely handle cputime accounting (need to find every "reader"
> > of cputime and flush cputimes for all of them).
> > -handle  perf
> > - handle irqtime finegrained accounting
> > - handle ilb load balancing
> > - etc...
> >
> 
> I gave the new version a spin (x86 8 way VM) and it looks cool.
> 
> I did get the following warning once, but couldn't recreate it:
> 
> [   31.812741] ------------[ cut here ]------------
> [   31.812741] WARNING: at
> /home/giladb/Workspace/linux/kernel/time/tick-sched.c:706
> tick_nohz_account_ticks+0x7c/0x90()
> [   31.812741] Hardware name: Bochs
> [   31.812741] Modules linked in:
> [   31.812741] Pid: 1006, comm: sh Not tainted 3.3.0-rc7+ #167
> [   31.812741] Call Trace:
> [   31.812741]  [<c102a3ad>] warn_slowpath_common+0x6d/0xa0
> [   31.812741]  [<c106be0c>] ? tick_nohz_account_ticks+0x7c/0x90
> [   31.812741]  [<c106be0c>] ? tick_nohz_account_ticks+0x7c/0x90
> [   31.812741]  [<c102a3fd>] warn_slowpath_null+0x1d/0x20
> [   31.812741]  [<c106be0c>] tick_nohz_account_ticks+0x7c/0x90
> [   31.812741]  [<c106be5f>] tick_nohz_flush_current_times+0x3f/0x80
> [   31.812741]  [<c106bf8d>] tick_nohz_restart_adaptive+0xd/0x30
> [   31.812741]  [<c106c02e>] tick_nohz_check_adaptive+0x3e/0x50
> [   31.812741]  [<c1018180>] smp_cpuset_update_nohz_interrupt+0x20/0x30
> [   31.812741]  [<c1639c6a>] cpuset_update_nohz_interrupt+0x2a/0x30
> [   31.812741]  [<c16395fd>] ? _raw_spin_unlock_irq+0xd/0x30
> [   31.812741]  [<c10575c6>] finish_task_switch+0x46/0xa0
> [   31.812741]  [<c1638558>] __schedule+0x398/0x910
> [   31.812741]  [<c10ef2f1>] ? deactivate_slab+0x611/0x730
> [   31.812741]  [<c1120777>] ? __find_get_block+0x97/0x1a0
> [   31.812741]  [<c1221214>] ? cpumask_next_and+0x24/0xa0
> [   31.812741]  [<c10558cb>] ? get_parent_ip+0xb/0x40
> [   31.812741]  [<c1638b50>] schedule+0x30/0x50
> [   31.812741]  [<c16379b5>] schedule_hrtimeout_range_clock+0xf5/0x110
> [   31.812741]  [<c10558cb>] ? get_parent_ip+0xb/0x40
> [   31.812741]  [<c10586db>] ? sub_preempt_count+0x7b/0xb0
> [   31.812741]  [<c1639633>] ? _raw_spin_unlock_irqrestore+0x13/0x40
> [   31.812741]  [<c1054140>] ? __wake_up+0x40/0x50
> [   31.812741]  [<c1294d1f>] ? put_ldisc+0x3f/0xa0
> [   31.812741]  [<c16379e2>] schedule_hrtimeout_range+0x12/0x20
> [   31.812741]  [<c1107969>] poll_schedule_timeout+0x39/0x60
> [   31.812741]  [<c1108020>] do_sys_poll+0x400/0x490
> [   31.812741]  [<c1054d15>] ? cpuacct_charge+0x65/0x70
> [   31.812741]  [<c1107a20>] ? poll_freewait+0x70/0x70
> [   31.812741]  [<c1107af0>] ? __pollwait+0xd0/0xd0
> [   31.812741]  [<c1107af0>] ? __pollwait+0xd0/0xd0
> [   31.812741]  [<c10094a3>] ? native_sched_clock+0x33/0xe0
> [   31.812741]  [<c105a0e2>] ? sched_clock_local+0xb2/0x190
> [   31.812741]  [<c1054d15>] ? cpuacct_charge+0x65/0x70
> [   31.812741]  [<c105b376>] ? update_curr+0x1a6/0x2a0
> [   31.812741]  [<c105a2f9>] ? sched_clock_cpu+0x139/0x190
> [   31.812741]  [<c105a0e2>] ? sched_clock_local+0xb2/0x190
> [   31.812741]  [<c104dd43>] ? hrtimer_forward+0x163/0x1b0
> [   31.812741]  [<c10644e2>] ? ktime_get+0x62/0x100
> [   31.812741]  [<c1018b56>] ? lapic_next_event+0x16/0x20
> [   31.812741]  [<c1069df2>] ? clockevents_program_event+0xc2/0x170
> [   31.812741]  [<c106b514>] ? tick_program_event+0x24/0x30
> [   31.812741]  [<c104cd1d>] ? hrtimer_interrupt+0x1ad/0x2e0
> [   31.812741]  [<c1095128>] ? rcu_pending+0x58/0x70
> [   31.812741]  [<c1030a3d>] ? irq_exit+0x6d/0x80
> [   31.812741]  [<c1019363>] ? smp_apic_timer_interrupt+0x53/0x90
> [   31.812741]  [<c11e0128>] ? avc_has_perm_noaudit+0xc8/0x360
> [   31.812741]  [<c163a3b6>] ? apic_timer_interrupt+0x2a/0x30
> [   31.812741]  [<c128f31e>] ? tty_ioctl+0x47e/0xa30
> [   31.812741]  [<c11e0d66>] ? inode_has_perm+0x36/0x50
> [   31.812741]  [<c11e13e8>] ? file_has_perm+0xa8/0xb0
> [   31.812741]  [<c128eea0>] ? tty_check_change+0xe0/0xe0
> [   31.812741]  [<c1106763>] ? do_vfs_ioctl+0x83/0x570
> [   31.812741]  [<c11e4e46>] ? selinux_file_ioctl+0x56/0x110
> [   31.812741]  [<c1108224>] sys_poll+0x54/0xb0
> [   31.812741]  [<c1639b29>] syscall_call+0x7/0xb
> [   31.812741] ---[ end trace 1d7d659b4aead681 ]---

Ah interesting. I think I see how that happened: we flushed the
time on tick_nohz_pre_schedule() and set SAVED_JIFFIES_NONE.
Then we received a nohz IPI before we could restart the tick
from tick_nohz_post_schedule(). With ts->tick_stopped we except
that ts->saved_jiffies_whence != SAVED_JIFFIES_NONE but that's
wrong.

I'll fix that.

> 
> With the two patches I'll attach to the next replies to this message,
> I've been able to get a task running
> on an isolated CPU with 0 timer interrupts.
> 
> In my case, I also had to disable the clocksource watchdog, but only
> because TSC is not stable on my VM.
> This is really not a nohz/cpuset problem.

Yeah that's a particular issue on its own. I luckily don't have it
on my main testbox.

> There is one source of interference to cpu isolation this causes,
> which is the cputime flush IPI. Every time you
> run a command in the shell you get 3 - 4 IPIs sent to the nohz cpuset
> to flush the cputimes so that thread group
> times get computed correctly. That's not very nice :-)
> 
> I've tried disabling the IPI send, just to see how it goes and as far
> as I've been able to tell you get bare metal like
> environment for a 100% cpu bound code with no interrupts. Of course.
> ps/top then show 0% cpu utilization for
> that task since without the IPI the times it spends on the CPU is not
> registered... that is a small price to pay
> in my eyes for bare metal performance on Linux, but what do I know? :-)

Yeah I'm sure we can reduce the amount of IPIs for the nohz thing. I've just
set a big one IPI executing on every tickless CPU for cases like cputime.
And may be too much IPIs sent for the scheduler and RCU. We can certainly
optimize everything. I'm not yet on the optimization stage but rather still
in the correctness one unfortunately :)

Thanks!

> 
> Overall, way cool. Please keep it up !
> 
> GIlad
> 
> -- 
> Gilad Ben-Yossef
> Chief Coffee Drinker
> gilad@benyossef.com
> Israel Cell: +972-52-8260388
> US Cell: +1-973-8260388
> http://benyossef.com
> 
> "If you take a class in large-scale robotics, can you end up in a
> situation where the homework eats your dog?"
>  -- Jean-Baptiste Queru

  parent reply	other threads:[~2012-03-28 11:43 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-21 13:58 [RFC][PATCH 00/32] Nohz cpusets v2 (adaptive tickless kernel) Frederic Weisbecker
2012-03-21 13:58 ` Frederic Weisbecker
2012-04-04 15:33   ` warning in tick_nohz_irq_exit Stephen Hemminger
2012-04-04 20:45     ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 01/32] nohz: Separate idle sleeping time accounting from nohz logic Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 02/32] nohz: Make nohz API agnostic against idle ticks cputime accounting Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 03/32] nohz: Rename ts->idle_tick to ts->last_tick Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 04/32] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 05/32] nohz: Move ts->idle_calls incrementation into strict " Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 06/32] nohz: Move next idle expiry time record into idle logic area Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 07/32] cpuset: Set up interface for nohz flag Frederic Weisbecker
2012-03-21 14:50   ` Christoph Lameter
2012-03-22  4:03     ` Mike Galbraith
2012-03-22 16:26       ` Christoph Lameter
2012-03-22 19:20         ` Mike Galbraith
2012-03-27 11:22       ` Frederic Weisbecker
2012-03-27 11:53         ` Mike Galbraith
2012-03-27 11:56           ` Frederic Weisbecker
2012-03-27 12:31             ` Mike Galbraith
2012-03-27 11:19     ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 08/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Frederic Weisbecker
2012-03-21 14:52   ` Christoph Lameter
2012-03-27 10:50     ` Frederic Weisbecker
2012-03-27 16:08       ` Christoph Lameter
2012-03-27 16:47         ` Peter Zijlstra
2012-03-28  1:12           ` Christoph Lameter
2012-03-28  8:39             ` Peter Zijlstra
2012-03-28 13:11               ` Dimitri Sivanich
2012-03-28 15:51               ` Chris Metcalf
2012-03-30  1:34         ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 09/32] x86: New cpuset nohz irq vector Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 10/32] nohz: Adaptive tick stop and restart on nohz cpuset Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs it Frederic Weisbecker
2012-03-21 14:54   ` Christoph Lameter
2012-03-22  7:38     ` Gilad Ben-Yossef
2012-03-22 16:18       ` Christoph Lameter
2012-03-27 15:21         ` Gilad Ben-Yossef
2012-03-28 12:39           ` Frederic Weisbecker
2012-03-28 12:57             ` Gilad Ben-Yossef
2012-03-28 13:38               ` Frederic Weisbecker
2012-03-22 17:18       ` Chris Metcalf
2012-03-27 15:31         ` Gilad Ben-Yossef
2012-03-27 15:43           ` Chris Metcalf
2012-03-28  8:36             ` Gilad Ben-Yossef
2012-03-27 12:13     ` Frederic Weisbecker
2012-03-27 16:13       ` Christoph Lameter
2012-03-27 16:24         ` Steven Rostedt
2012-03-28  0:42           ` Christoph Lameter
2012-03-28  1:06             ` Steven Rostedt
2012-03-28  1:19               ` Christoph Lameter
2012-03-28  1:35                 ` Steven Rostedt
2012-03-28  3:17                   ` Steven Rostedt
2012-03-28  7:55                     ` Gilad Ben-Yossef
2012-03-28 12:21                       ` Frederic Weisbecker
2012-03-28 12:41                         ` Gilad Ben-Yossef
2012-03-28 14:02                       ` Steven Rostedt
2012-03-28 11:53         ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 12/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 13/32] nohz/cpuset: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 14/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 15/32] nohz/cpuset: Restart the tick if printk needs it Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 16/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 17/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 18/32] nohz: Generalize tickless cpu time accounting Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 19/32] nohz/cpuset: Account user and system times in adaptive nohz mode Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 20/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 21/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Frederic Weisbecker
2012-03-27 14:10   ` Gilad Ben-Yossef
2012-03-27 14:23     ` Gilad Ben-Yossef
2012-03-28 11:20       ` Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 22/32] nohz/cpuset: Flush cputimes on procfs stat file read Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 23/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 24/32] x86: Syscall hooks for nohz cpusets Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 25/32] x86: Exception " Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 26/32] x86: Add adaptive tickless hooks on do_notify_resume() Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 27/32] nohz: Don't restart the tick before scheduling to idle Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 28/32] rcu: New rcu_user_enter() and rcu_user_exit() APIs Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 29/32] rcu: New rcu_user_enter_irq() and rcu_user_exit_irq() APIs Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 31/32] nohz: Exit RCU idle mode when we schedule before resuming userspace Frederic Weisbecker
2012-03-21 13:58 ` [PATCH 32/32] nohz/cpuset: Disable under some configs Frederic Weisbecker
2012-03-27 15:02 ` [RFC][PATCH 00/32] Nohz cpusets v2 (adaptive tickless kernel) Gilad Ben-Yossef
2012-03-27 15:04   ` Gilad Ben-Yossef
2012-03-27 15:05     ` Gilad Ben-Yossef
2012-03-27 16:22       ` Christoph Lameter
2012-03-28  6:47         ` Gilad Ben-Yossef
2012-03-27 15:10   ` Peter Zijlstra
2012-03-27 15:18     ` Gilad Ben-Yossef
2012-05-22 21:31     ` Thomas Gleixner
2012-05-22 21:50       ` Steven Rostedt
2012-05-22 22:22         ` Thomas Gleixner
2012-03-28 11:43   ` Frederic Weisbecker [this message]
2012-03-30  0:33 ` Kevin Hilman
2012-03-30  0:45   ` Frederic Weisbecker
2012-03-30  2:07     ` Geoff Levand
2012-03-30 14:10       ` Kevin Hilman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120328114323.GB17189@somewhere.redhat.com \
    --to=fweisbec@gmail.com \
    --cc=abogani@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=cl@linux.com \
    --cc=cmetcalf@tilera.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=geoff@infradead.org \
    --cc=gilad@benyossef.com \
    --cc=linaro-sched-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxk@qualcomm.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shemminger@vyatta.com \
    --cc=tglx@linutronix.de \
    --cc=thebigcorporation@gmail.com \
    --cc=zen@openhuawei.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.