public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [git pull] scheduler updates for v2.6.24
@ 2007-10-15 14:17 Ingo Molnar
  2007-10-15 15:04 ` Ingo Molnar
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Ingo Molnar @ 2007-10-15 14:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


Linus, please pull the latest scheduler git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

It contains lots of scheduler updates from lots of people - hopefully 
the last big one for quite some time. Most of the focus was on 
performance (both micro-performance and scalability/balancing), but 
there's the fair-scheduling feature now Kconfig selectable too. Find the 
shortlog below.

Code that is touched outside of the scheduler: the KVM bits were acked 
by Avi, the net/unix change is trivial and only affects sync wakeups, 
ditto the fs/pipe.c changes - but i can push those separately if it 
needs an ack from David first.

ABI/API changes:

 - new CONFIG_FAIR_USER_SCHED and /sys/kernel/uids/ + uevent API.
 - /proc/stat and /proc/<pid>/stat changes for guest-CPU usage [KVM]
 - /proc/sched_debug formats changed/enhanced

Testing status: the changes are chronological and all the 
interactivity-impacting changes are near the head of the queue and most 
of them were done weeks ago, and were thus part of the CFS-v22 backport 
series - which was tested by many people. There are no known regressions 
at the moment. It's all fully bisectable.

Thanks,

	Ingo

------------------>
Alexey Dobriyan (1):
      sched: uninline scheduler

Andi Kleen (4):
      sched: cleanup: remove unnecessary gotos
      sched: cleanup: refactor common code of sleep_on / wait_for_completion
      sched: cleanup: refactor normalize_rt_tasks
      sched: remove stale comment from sched_group_set_shares()

Arjan van de Ven (1):
      Make scheduler debug file operations const

Dhaval Giani (1):
      sched: group scheduling, sysfs tunables

Dmitry Adamushko (14):
      sched: clean up struct load_stat
      sched: clean up schedstat block in dequeue_entity()
      sched: sched_setscheduler() fix
      sched: add set_curr_task() calls
      sched: do not keep current in the tree and get rid of sched_entity::fair_key
      sched: optimize task_new_fair()
      sched: simplify sched_class::yield_task()
      sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
      sched: yield fix
      sched: fix __pick_next_entity()
      sched: tidy up SCHED_RR
      sched: cleanup, remove calc_weighted()
      sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar
      sched: fix group scheduling for SCHED_BATCH

Gautham R Shenoy (1):
      sched: fix rt ptracer monopolizing CPU

Hiroshi Shimamoto (1):
      sched: clean up sched_fork()

Ingo Molnar (71):
      sched: fix sysctl_sched_child_runs_first flag
      sched: resched task in task_new_fair()
      sched: small sched_debug cleanup
      sched: debug: track maximum 'slice'
      sched: uniform tunings
      sched: use constants if !CONFIG_SCHED_DEBUG
      sched: remove stat_gran
      sched: remove precise CPU load
      sched: remove precise CPU load calculations #2
      sched: track cfs_rq->curr on !group-scheduling too
      sched: cleanup: simplify cfs_rq_curr() methods
      sched: uninline __enqueue_entity()/__dequeue_entity()
      sched: speed up update_load_add/_sub()
      sched: clean up calc_weighted()
      sched: introduce se->vruntime
      sched: move sched_feat() definitions
      sched: optimize vruntime based scheduling
      sched: simplify check_preempt() methods
      sched: wakeup granularity increase
      sched: add se->vruntime debugging
      sched: remove SCHED_FEAT_SKIP_INITIAL
      sched: add more vruntime statistics
      sched: debug: update exec_clock only when SCHED_DEBUG
      sched: remove wait_runtime limit
      sched: remove wait_runtime fields and features
      sched: x86: allow single-depth wchan output
      sched: fix delay accounting performance regression
      sched: prettify /proc/sched_debug output
      sched: enhance debug output
      sched: kernel/sched_fair.c whitespace cleanups
      sched: fair-group sched, cleanups
      sched: enable CONFIG_FAIR_GROUP_SCHED=y by default
      sched debug: BKL usage statistics
      sched: remove unneeded tunables
      sched debug: print settings
      sched debug: more width for parameter printouts
      sched: entity_key() fix
      sched: remove condition from set_task_cpu()
      sched: remove last_min_vruntime effect
      sched: undo some of the recent changes
      sched: fix sign check error in place_entity()
      sched: fix sched_fork()
      sched: remove set_leftmost()
      sched: clean up schedstats, cnt -> count
      sched: cleanup, remove stale comment
      sched: mark scheduling classes as const
      sched: whitespace cleanups
      sched: vslice fixups for non-0 nice levels
      sched: optimize schedule() a bit on SMP
      sched: tweak wakeup granularity
      sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y
      sched: break out if printing a warning in sched_domain_debug()
      sched: style cleanup
      sched: kfree(NULL) is valid
      sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG
      sched: cleanup: rename task_grp to task_group
      sched: cleanup: function prototype cleanups
      sched: fix: move the CPU check into ->task_new_fair()
      sched: update comment
      sched: clean up is_migration_thread()
      sched: do not normalize kernel threads via SysRq-N
      sched: do not wakeup-preempt with SCHED_BATCH tasks
      sched: speed up context-switches a bit
      sched: reintroduce cache-hot affinity
      sched: debug: increase width of debug line
      sched: debug, improve migration statistics
      sched: allow the immediate migration of cache-cold tasks
      sched: reintroduce topology.h tunings
      sched: enable wake-idle on CONFIG_SCHED_MC=y
      sched: affine sync wakeups
      sched: sync wakeups preempt too

Laurent Vivier (4):
      sched: guest CPU accounting: add guest-CPU /proc/stat field
      sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
      sched: guest CPU accounting: maintain stats in account_system_time()
      sched: guest CPU accounting: maintain guest state in KVM

Matthias Kaehlcke (1):
      sched: use list_for_each_entry_safe() in __wake_up_common()

Mike Galbraith (4):
      sched: fix SMP migration latencies
      sched: fix formatting of /proc/sched_debug
      sched: cleanup, remove the TASK_NONINTERACTIVE flag
      sched: prevent wakeup over-scheduling

Milton Miller (5):
      sched: domain sysctl fixes: use kcalloc()
      sched: domain sysctl fixes: use for_each_online_cpu()
      sched: domain sysctl fixes: unregister the sysctl table before domains
      sched: domain sysctl fixes: do not crash on allocation failure
      sched: domain sysctl fixes: add terminator comment

Paul E. McKenney (1):
      sched: export cpu_clock()

Peter Williams (2):
      sched: reduce balance-tasks overhead
      sched: isolate SMP balancing code a bit more

Peter Zijlstra (16):
      sched: simplify SCHED_FEAT_* code
      sched: new task placement for vruntime
      sched: simplify adaptive latency
      sched: clean up new task placement
      sched: add tree based averages
      sched: handle vruntime 64-bit overflow
      sched: better min_vruntime tracking
      sched: add vslice
      sched debug: check spread
      sched: max_vruntime() simplification
      sched: clean up min_vruntime use
      sched: speed up and simplify vslice calculations
      sched: another wakeup_granularity fix
      sched: disable sleeper_fairness on SCHED_BATCH
      sched: disable forced preemption by default
      sched: activate task_hot() only on fair-scheduled tasks

S.Caglar Onur (1):
      sched debug: BKL usage statistics, fix

Srivatsa Vaddagiri (13):
      sched: group-scheduler core
      sched: revert recent removal of set_curr_task()
      sched: fix minor bug in yield
      sched: print nr_running and load in /proc/sched_debug
      sched: print &rq->cfs stats
      sched: clean up code under CONFIG_FAIR_GROUP_SCHED
      sched: add fair-user scheduler
      sched: group scheduler wakeup latency fix
      sched: group scheduler SMP migration fix
      sched: group scheduler, fix coding style issues
      sched: group scheduler, fix bloat
      sched: group scheduler, fix latency
      sched: generate uevents for user creation/destruction

Zou Nan hai (1):
      sched: some proc entries are missed in sched_domain sys_ctl debug code

 Documentation/sched-design-CFS.txt |   67 +
 arch/i386/Kconfig                  |   11 
 drivers/kvm/kvm.h                  |   10 
 drivers/kvm/kvm_main.c             |    2 
 fs/pipe.c                          |    9 
 fs/proc/array.c                    |   17 
 fs/proc/base.c                     |    2 
 fs/proc/proc_misc.c                |   15 
 include/linux/kernel_stat.h        |    1 
 include/linux/sched.h              |  108 +-
 include/linux/topology.h           |    5 
 init/Kconfig                       |   21 
 kernel/delayacct.c                 |    2 
 kernel/exit.c                      |    6 
 kernel/fork.c                      |    3 
 kernel/ksysfs.c                    |    8 
 kernel/sched.c                     | 1526 +++++++++++++++++++++----------------
 kernel/sched_debug.c               |  282 ++++--
 kernel/sched_fair.c                |  859 ++++++++------------
 kernel/sched_idletask.c            |   26 
 kernel/sched_rt.c                  |   51 -
 kernel/sched_stats.h               |   28 
 kernel/sysctl.c                    |   37 
 kernel/user.c                      |  249 +++++-
 net/unix/af_unix.c                 |    4 
 25 files changed, 1998 insertions(+), 1351 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-15 14:17 [git pull] scheduler updates for v2.6.24 Ingo Molnar
@ 2007-10-15 15:04 ` Ingo Molnar
  2007-10-15 18:35 ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2007-10-15 15:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton


* Ingo Molnar <mingo@elte.hu> wrote:

> Linus, please pull the latest scheduler git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

oops, these two cleanups caused build failures in some config variants:

>       sched: reduce balance-tasks overhead
>       sched: isolate SMP balancing code a bit more

so i dropped them and re-pushed. New shortlog below.

	Ingo

------------------>
Alexey Dobriyan (1):
      sched: uninline scheduler

Andi Kleen (4):
      sched: cleanup: remove unnecessary gotos
      sched: cleanup: refactor common code of sleep_on / wait_for_completion
      sched: cleanup: refactor normalize_rt_tasks
      sched: remove stale comment from sched_group_set_shares()

Arjan van de Ven (1):
      Make scheduler debug file operations const

Dhaval Giani (1):
      sched: group scheduling, sysfs tunables

Dmitry Adamushko (14):
      sched: clean up struct load_stat
      sched: clean up schedstat block in dequeue_entity()
      sched: sched_setscheduler() fix
      sched: add set_curr_task() calls
      sched: do not keep current in the tree and get rid of sched_entity::fair_key
      sched: optimize task_new_fair()
      sched: simplify sched_class::yield_task()
      sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
      sched: yield fix
      sched: fix __pick_next_entity()
      sched: tidy up SCHED_RR
      sched: cleanup, remove calc_weighted()
      sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar
      sched: fix group scheduling for SCHED_BATCH

Gautham R Shenoy (1):
      sched: fix rt ptracer monopolizing CPU

Hiroshi Shimamoto (1):
      sched: clean up sched_fork()

Ingo Molnar (71):
      sched: fix sysctl_sched_child_runs_first flag
      sched: resched task in task_new_fair()
      sched: small sched_debug cleanup
      sched: debug: track maximum 'slice'
      sched: uniform tunings
      sched: use constants if !CONFIG_SCHED_DEBUG
      sched: remove stat_gran
      sched: remove precise CPU load
      sched: remove precise CPU load calculations #2
      sched: track cfs_rq->curr on !group-scheduling too
      sched: cleanup: simplify cfs_rq_curr() methods
      sched: uninline __enqueue_entity()/__dequeue_entity()
      sched: speed up update_load_add/_sub()
      sched: clean up calc_weighted()
      sched: introduce se->vruntime
      sched: move sched_feat() definitions
      sched: optimize vruntime based scheduling
      sched: simplify check_preempt() methods
      sched: wakeup granularity increase
      sched: add se->vruntime debugging
      sched: remove SCHED_FEAT_SKIP_INITIAL
      sched: add more vruntime statistics
      sched: debug: update exec_clock only when SCHED_DEBUG
      sched: remove wait_runtime limit
      sched: remove wait_runtime fields and features
      sched: x86: allow single-depth wchan output
      sched: fix delay accounting performance regression
      sched: prettify /proc/sched_debug output
      sched: enhance debug output
      sched: kernel/sched_fair.c whitespace cleanups
      sched: fair-group sched, cleanups
      sched: enable CONFIG_FAIR_GROUP_SCHED=y by default
      sched debug: BKL usage statistics
      sched: remove unneeded tunables
      sched debug: print settings
      sched debug: more width for parameter printouts
      sched: entity_key() fix
      sched: remove condition from set_task_cpu()
      sched: remove last_min_vruntime effect
      sched: undo some of the recent changes
      sched: fix sign check error in place_entity()
      sched: fix sched_fork()
      sched: remove set_leftmost()
      sched: clean up schedstats, cnt -> count
      sched: cleanup, remove stale comment
      sched: mark scheduling classes as const
      sched: whitespace cleanups
      sched: vslice fixups for non-0 nice levels
      sched: optimize schedule() a bit on SMP
      sched: tweak wakeup granularity
      sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y
      sched: break out if printing a warning in sched_domain_debug()
      sched: style cleanup
      sched: kfree(NULL) is valid
      sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG
      sched: cleanup: rename task_grp to task_group
      sched: cleanup: function prototype cleanups
      sched: fix: move the CPU check into ->task_new_fair()
      sched: update comment
      sched: clean up is_migration_thread()
      sched: do not normalize kernel threads via SysRq-N
      sched: do not wakeup-preempt with SCHED_BATCH tasks
      sched: speed up context-switches a bit
      sched: reintroduce cache-hot affinity
      sched: debug: increase width of debug line
      sched: debug, improve migration statistics
      sched: allow the immediate migration of cache-cold tasks
      sched: reintroduce topology.h tunings
      sched: enable wake-idle on CONFIG_SCHED_MC=y
      sched: affine sync wakeups
      sched: sync wakeups preempt too

Laurent Vivier (4):
      sched: guest CPU accounting: add guest-CPU /proc/stat field
      sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
      sched: guest CPU accounting: maintain stats in account_system_time()
      sched: guest CPU accounting: maintain guest state in KVM

Matthias Kaehlcke (1):
      sched: use list_for_each_entry_safe() in __wake_up_common()

Mike Galbraith (4):
      sched: fix SMP migration latencies
      sched: fix formatting of /proc/sched_debug
      sched: cleanup, remove the TASK_NONINTERACTIVE flag
      sched: prevent wakeup over-scheduling

Milton Miller (5):
      sched: domain sysctl fixes: use kcalloc()
      sched: domain sysctl fixes: use for_each_online_cpu()
      sched: domain sysctl fixes: unregister the sysctl table before domains
      sched: domain sysctl fixes: do not crash on allocation failure
      sched: domain sysctl fixes: add terminator comment

Paul E. McKenney (1):
      sched: export cpu_clock()

Peter Zijlstra (16):
      sched: simplify SCHED_FEAT_* code
      sched: new task placement for vruntime
      sched: simplify adaptive latency
      sched: clean up new task placement
      sched: add tree based averages
      sched: handle vruntime 64-bit overflow
      sched: better min_vruntime tracking
      sched: add vslice
      sched debug: check spread
      sched: max_vruntime() simplification
      sched: clean up min_vruntime use
      sched: speed up and simplify vslice calculations
      sched: another wakeup_granularity fix
      sched: disable sleeper_fairness on SCHED_BATCH
      sched: disable forced preemption by default
      sched: activate task_hot() only on fair-scheduled tasks

S.Caglar Onur (1):
      sched debug: BKL usage statistics, fix

Srivatsa Vaddagiri (13):
      sched: group-scheduler core
      sched: revert recent removal of set_curr_task()
      sched: fix minor bug in yield
      sched: print nr_running and load in /proc/sched_debug
      sched: print &rq->cfs stats
      sched: clean up code under CONFIG_FAIR_GROUP_SCHED
      sched: add fair-user scheduler
      sched: group scheduler wakeup latency fix
      sched: group scheduler SMP migration fix
      sched: group scheduler, fix coding style issues
      sched: group scheduler, fix bloat
      sched: group scheduler, fix latency
      sched: generate uevents for user creation/destruction

Zou Nan hai (1):
      sched: some proc entries are missed in sched_domain sys_ctl debug code

 Documentation/sched-design-CFS.txt |   67 +
 arch/i386/Kconfig                  |   11 
 drivers/kvm/kvm.h                  |   10 
 drivers/kvm/kvm_main.c             |    2 
 fs/pipe.c                          |    9 
 fs/proc/array.c                    |   17 
 fs/proc/base.c                     |    2 
 fs/proc/proc_misc.c                |   15 
 include/linux/kernel_stat.h        |    1 
 include/linux/sched.h              |   99 +-
 include/linux/topology.h           |    5 
 init/Kconfig                       |   21 
 kernel/delayacct.c                 |    2 
 kernel/exit.c                      |    6 
 kernel/fork.c                      |    3 
 kernel/ksysfs.c                    |    8 
 kernel/sched.c                     | 1444 +++++++++++++++++++++----------------
 kernel/sched_debug.c               |  282 ++++---
 kernel/sched_fair.c                |  811 ++++++++------------
 kernel/sched_idletask.c            |    8 
 kernel/sched_rt.c                  |   19 
 kernel/sched_stats.h               |   28 
 kernel/sysctl.c                    |   37 
 kernel/user.c                      |  249 ++++++
 net/unix/af_unix.c                 |    4 
 25 files changed, 1872 insertions(+), 1288 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-15 14:17 [git pull] scheduler updates for v2.6.24 Ingo Molnar
  2007-10-15 15:04 ` Ingo Molnar
@ 2007-10-15 18:35 ` Andrew Morton
  2007-10-15 18:53   ` Ingo Molnar
  2007-10-16  2:38 ` Nick Piggin
  2007-10-16 10:04 ` Thomas Backlund
  3 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2007-10-15 18:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: torvalds, linux-kernel

On Mon, 15 Oct 2007 16:17:23 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> Linus, please pull the latest scheduler git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

Did Paul Jackson's crash get fixed?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-15 18:35 ` Andrew Morton
@ 2007-10-15 18:53   ` Ingo Molnar
  2007-10-16 22:13     ` Gabriel C
  2007-10-16 22:38     ` Dmitry Adamushko
  0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2007-10-15 18:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 15 Oct 2007 16:17:23 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > Linus, please pull the latest scheduler git tree from:
> > 
> >    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
> 
> Did Paul Jackson's crash get fixed?

yes - that crash was a showstopper that was holding up the pull request 
for 2 days. Paul bisected it down to the culprit and the fix was to do 
this in wake_up_new_task():

-       if (!p->sched_class->task_new || !current->se.on_rq) {
+       if (!p->sched_class->task_new || !current->se.on_rq || !rq->cfs.curr) {

(during early bootup the cfs_rq has no curr pointer yet.) It's not clear 
why this race did not trigger earlier. (and the two checks can probably 
be consolidated into a single "!rq->cfs.curr" condition.)

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-15 14:17 [git pull] scheduler updates for v2.6.24 Ingo Molnar
  2007-10-15 15:04 ` Ingo Molnar
  2007-10-15 18:35 ` Andrew Morton
@ 2007-10-16  2:38 ` Nick Piggin
  2007-10-16 10:04 ` Thomas Backlund
  3 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2007-10-16  2:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel, Andrew Morton

On Tuesday 16 October 2007 00:17, Ingo Molnar wrote:
> Linus, please pull the latest scheduler git tree from:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
>
> It contains lots of scheduler updates from lots of people - hopefully
> the last big one for quite some time. Most of the focus was on
> performance (both micro-performance and scalability/balancing), but
> there's the fair-scheduling feature now Kconfig selectable too. Find the
> shortlog below.

Nice work...

However it's a pity all the balancing stuff got wildly changed
in 2.6.23 and then somewhat changed back again now.

Despite appearances, a lot of those things weren't actually
*completely* arbitrary values. I fear that it will make finding
performance regressions harder than it should have...

Anyway.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-15 14:17 [git pull] scheduler updates for v2.6.24 Ingo Molnar
                   ` (2 preceding siblings ...)
  2007-10-16  2:38 ` Nick Piggin
@ 2007-10-16 10:04 ` Thomas Backlund
  2007-10-16 10:08   ` Ingo Molnar
  3 siblings, 1 reply; 13+ messages in thread
From: Thomas Backlund @ 2007-10-16 10:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar skrev:
> Linus, please pull the latest scheduler git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
> 
> It contains lots of scheduler updates from lots of people - hopefully 
> the last big one for quite some time. Most of the focus was on 
> performance (both micro-performance and scalability/balancing), but 
> there's the fair-scheduling feature now Kconfig selectable too. Find the 
> shortlog below.
> 


How does this one compare to the v22 you released earlier ?

I'm thinking of backporting any fixes/optimizations to 2.6.22
(and possibly 2.6.23)

--
Thomas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-16 10:04 ` Thomas Backlund
@ 2007-10-16 10:08   ` Ingo Molnar
  2007-10-16 10:12     ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2007-10-16 10:08 UTC (permalink / raw)
  To: Thomas Backlund; +Cc: linux-kernel


* Thomas Backlund <tmb@mandriva.org> wrote:

> How does this one compare to the v22 you released earlier ?

v22 has most of it included.

> I'm thinking of backporting any fixes/optimizations to 2.6.22 (and 
> possibly 2.6.23)

i have already backported it as v22.1 - will release it within a few 
days. (once the currently open regressions have been fixed)

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-16 10:08   ` Ingo Molnar
@ 2007-10-16 10:12     ` Ingo Molnar
  2007-10-16 11:00       ` Thomas Backlund
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2007-10-16 10:12 UTC (permalink / raw)
  To: Thomas Backlund; +Cc: linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> * Thomas Backlund <tmb@mandriva.org> wrote:
> 
> > How does this one compare to the v22 you released earlier ?
> 
> v22 has most of it included.
> 
> > I'm thinking of backporting any fixes/optimizations to 2.6.22 (and 
> > possibly 2.6.23)
> 
> i have already backported it as v22.1 - will release it within a few 
> days. (once the currently open regressions have been fixed)

i've uploaded what i have at the moment, to:

   http://people.redhat.com/mingo/cfs-scheduler/devel/sched-cfs-v2.6.23.1-v22.1-rc0.patch

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-16 10:12     ` Ingo Molnar
@ 2007-10-16 11:00       ` Thomas Backlund
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Backlund @ 2007-10-16 11:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar skrev:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>> * Thomas Backlund <tmb@mandriva.org> wrote:
>>
>>> How does this one compare to the v22 you released earlier ?
>> v22 has most of it included.
>>

OK, that's what I thought

>>> I'm thinking of backporting any fixes/optimizations to 2.6.22 (and 
>>> possibly 2.6.23)
>> i have already backported it as v22.1 - will release it within a few 
>> days. (once the currently open regressions have been fixed)
> 

OK

> i've uploaded what i have at the moment, to:
> 
>    http://people.redhat.com/mingo/cfs-scheduler/devel/sched-cfs-v2.6.23.1-v22.1-rc0.patch
> 
> 	Ingo

Big thanks for your work...

Now I just have to see if I can get it to work with the -hrt series and 
I'm really happy ;-)

--
Thomas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-15 18:53   ` Ingo Molnar
@ 2007-10-16 22:13     ` Gabriel C
  2007-10-16 23:31       ` Dmitry Adamushko
  2007-10-16 22:38     ` Dmitry Adamushko
  1 sibling, 1 reply; 13+ messages in thread
From: Gabriel C @ 2007-10-16 22:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, torvalds, linux-kernel

Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>> On Mon, 15 Oct 2007 16:17:23 +0200
>> Ingo Molnar <mingo@elte.hu> wrote:
>>
>>> Linus, please pull the latest scheduler git tree from:
>>>
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
>> Did Paul Jackson's crash get fixed?
> 
> yes - that crash was a showstopper that was holding up the pull request 
> for 2 days. Paul bisected it down to the culprit and the fix was to do 
> this in wake_up_new_task():
> 
> -       if (!p->sched_class->task_new || !current->se.on_rq) {
> +       if (!p->sched_class->task_new || !current->se.on_rq || !rq->cfs.curr) {
> 
> (during early bootup the cfs_rq has no curr pointer yet.) It's not clear 
> why this race did not trigger earlier. (and the two checks can probably 
> be consolidated into a single "!rq->cfs.curr" condition.)

Maybe not related to that but now my box is killed after this merge.

When I do not much on the box I get maybe 6h uptime , by doing some work ( compiling etc ) is random freeze.

I was able to capture the OOps finally :

...

[15692.917111] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044
[15692.917159]  printing eip:
[15692.917174] c0111f90
[15692.917185] *pde = 00000000
[15692.917200] Oops: 0000 [#1]
[15692.917208] PREEMPT SMP
[15692.917240] Modules linked in: fuse netconsole configfs pc87360 hwmon_vid eeprom adm1021 uhci_hcd sr_mod shpchp pci_hotplug ohci_hcd iTCO_wdt iTCO_vendor_support intel_agp i82860_edac i2c_i801 ehci_hcd usbcore edac_core cdrom agpgart 3c59x mii ext4dev jbd2 capability commoncap loop lp parport_pc parport evdev
[15692.917623] CPU:    0
[15692.917625] EIP:    0060:[<c0111f90>]    Not tainted VLI
[15692.917629] EFLAGS: 00010046   (2.6.23-g65a6ec0d #330)
[15692.917661] EIP is at pick_next_task_fair+0x1f/0x2d
[15692.917672] eax: c150a7b8   ebx: 00000000   ecx: 00000000   edx: 00000000
[15692.917689] esi: c1507a48   edi: 00000000   ebp: 00eaaf7a   esp: cb1fdf14
[15692.917701] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
[15692.917715] Process sed (pid: 28999, ti=cb1fc000 task=cfdc3500 task.ti=cb1fc000)
[15692.917725] Stack: c02f8268 c02ef7b5 00000002 cb1fdf58 cb1fdf50 00000000 c0400f38 c0403780
[15692.917833]        cfdc3500 cfdc3634 c150a780 00000000 c011a8e7 00000000 c1077aa0 000000ff
[15692.917942]        00000000 00000000 00000000 cb1fdf8c 00000010 cfdc3500 cb1fdf8c c011ace5
[15692.918048] Call Trace:
[15692.918072]  [<c02ef7b5>] schedule+0x321/0x58f
[15692.918109]  [<c011a8e7>] do_exit+0x293/0x6c6
[15692.918143]  [<c011ace5>] do_exit+0x691/0x6c6
[15692.918169]  [<c011ad87>] sys_exit_group+0x0/0xd
[15692.918195]  [<c01026e6>] sysenter_past_esp+0x5f/0x85
[15692.918232]  =======================
[15692.918244] Code: 8b 53 28 89 43 34 89 53 38 5b 5e c3 53 31 d2 83 78 40 00 74 20 83 c0 38 8b 50 20 31 db 85 d2 74 0a 8d 5a f8 89 da e8 a9 ff ff ff <8b> 43 44 85 c0 75 e6 8d 53 d0 89 d0 5b c3 57 56 53 89 c6 89 d7
[15692.918981] EIP: [<c0111f90>] pick_next_task_fair+0x1f/0x2d SS:ESP 0068:cb1fdf14

...

After that the box is death need to hard reset it.

Interesting thing is when I compile the kernel with debug I don't get that ( or maybe its need longer to triggers it ? )

Config , lspci , dmesg , hardware specs , Oops message , and the top output when it Oops'ed there :


http://194.231.229.228/lara/

> 
> 	Ingo

Regards,

Gabriel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-15 18:53   ` Ingo Molnar
  2007-10-16 22:13     ` Gabriel C
@ 2007-10-16 22:38     ` Dmitry Adamushko
  1 sibling, 0 replies; 13+ messages in thread
From: Dmitry Adamushko @ 2007-10-16 22:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, torvalds, linux-kernel, Paul Jackson

On 15/10/2007, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > On Mon, 15 Oct 2007 16:17:23 +0200
> > Ingo Molnar <mingo@elte.hu> wrote:
> >
> > > Linus, please pull the latest scheduler git tree from:
> > >
> > >    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
> >
> > Did Paul Jackson's crash get fixed?
>
> yes - that crash was a showstopper that was holding up the pull request
> for 2 days. Paul bisected it down to the culprit and the fix was to do
> this in wake_up_new_task():
>
> -       if (!p->sched_class->task_new || !current->se.on_rq) {
> +       if (!p->sched_class->task_new || !current->se.on_rq || !rq->cfs.curr) {
>
> (during early bootup the cfs_rq has no curr pointer yet.) It's not clear
> why this race did not trigger earlier.

an update on this issue:

shortly, SD_BALANCE_FORK is required to trigger this problem and
hence, only NUMA machines could have been affected by it (and only
ia64 and x86 have SD_BALANCE_FORK in SD_NODE_INIT).

more details:

it's perfectly legitimate for 'rq->cfs.curr' to be NULL in
task_new_fair() in the case when this_cpu != task_cpu(p) (p -- is a
newly created task).

why this_cpu != task_cpu(p) :

do_fork() --> copy_process() --> sched_fork() -->
cpu = sched_balance_self(this_cpu, SD_BALANCE_FORK)

chose a different cpu for the new task and there is _no_
'class_sched_fair' task running on this cpu at the moment (that's why
rq->cfs.curr == NULL).

[ thanks a lot to Paul for providing debugging information ]

btw., it's not the 'curr->vruntime < se->vruntime' part in
task_new_fair() that gave us the oops (it's only executed in the case
of this_cpu == task_cpu(p)) _but_ it's rather:

[*] check_spread(cfs_rq, curr) which also accesses 'curr->vruntime'.

> (and the two checks can probably
> be consolidated into a single "!rq->cfs.curr" condition.)

2 checks are required as 'current' and rq->cfs.curr are not the same :-)
It also should work if we just get rid of [*] or add an adiitional
(curr != NULL) check there.

just as a additional observation:

there are lots of per-cpu threads (like events/cpu, ksoftirq/cpu,
etc.) being created on start-up (x NUMBER_OF_CPUS) and SD_SCHED_FORK
(actually, sched_balance_self() from sched_fork()) is just an overhead
in this case...
although, sched_balance_self() is likely to be responsible for a minor
% of the time taken to create a new context so optimizing it away
(esp. for some corner cases) won't improve the start-up time
noticeable.


>
>         Ingo
> -

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-16 22:13     ` Gabriel C
@ 2007-10-16 23:31       ` Dmitry Adamushko
  2007-10-16 23:50         ` Gabriel C
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Adamushko @ 2007-10-16 23:31 UTC (permalink / raw)
  To: Gabriel C, Srivatsa Vaddagiri
  Cc: Ingo Molnar, Andrew Morton, torvalds, linux-kernel

[ cc'ed Srivatsa ]

On 17/10/2007, Gabriel C <nix.or.die@googlemail.com> wrote:
> Ingo Molnar wrote:
> [15692.917111] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044
> ...
> [15692.917629] EFLAGS: 00010046   (2.6.23-g65a6ec0d #330)
> [15692.917661] EIP is at pick_next_task_fair+0x1f/0x2d

Gabriel, could you please post a disassembled code for pick_next_task_fair()?
(objdump -d kernel/sched.o and then search for pick_next_task_fair --
copy_and_past)

anyway, my guess is that it's :

                se = pick_next_entity(cfs_rq);
                cfs_rq = group_cfs_rq(se);

'se' _happens_ to be NULL and group_cf_rq(se) does se->my_q and
(according to my calculations) offset(my_q) == 68 (0x44) for x86 32bit
system with CONFIG_SCHEDSTATS=n and CONFIG_FAIR_GROUP_SCHED=y
(according to the config).

that might take place provided put_prev_task_fair() failed for some
reason to insert 'current' (or its corresponding group element) back
into the tree in schedule()... say, due to some inconsistency in
cfs_rq's data.

Srivatsa, that's somewhat similar to another issue that has been
posted earlier today (crash in put_prev_task_fair() -->
__enqueue_task() --> rb_insert_color()) that you are already aware of
...  (/me will continue tomorrow).


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [git pull] scheduler updates for v2.6.24
  2007-10-16 23:31       ` Dmitry Adamushko
@ 2007-10-16 23:50         ` Gabriel C
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriel C @ 2007-10-16 23:50 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Srivatsa Vaddagiri, Ingo Molnar, Andrew Morton, torvalds,
	linux-kernel

Dmitry Adamushko wrote:
> [ cc'ed Srivatsa ]
> 
> On 17/10/2007, Gabriel C <nix.or.die@googlemail.com> wrote:
>> Ingo Molnar wrote:
>> [15692.917111] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044
>> ...
>> [15692.917629] EFLAGS: 00010046   (2.6.23-g65a6ec0d #330)
>> [15692.917661] EIP is at pick_next_task_fair+0x1f/0x2d
> 
> Gabriel, could you please post a disassembled code for pick_next_task_fair()?
> (objdump -d kernel/sched.o and then search for pick_next_task_fair --
> copy_and_past)

Sure here it is :

00000e49 <pick_next_task_fair>:
     e49:       53                      push   %ebx
     e4a:       31 d2                   xor    %edx,%edx
     e4c:       83 78 40 00             cmpl   $0x0,0x40(%eax)
     e50:       74 20                   je     e72 <pick_next_task_fair+0x29>
     e52:       83 c0 38                add    $0x38,%eax
     e55:       8b 50 20                mov    0x20(%eax),%edx
     e58:       31 db                   xor    %ebx,%ebx
     e5a:       85 d2                   test   %edx,%edx
     e5c:       74 0a                   je     e68 <pick_next_task_fair+0x1f>
     e5e:       8d 5a f8                lea    -0x8(%edx),%ebx
     e61:       89 da                   mov    %ebx,%edx
     e63:       e8 a9 ff ff ff          call   e11 <set_next_entity>
     e68:       8b 43 44                mov    0x44(%ebx),%eax
     e6b:       85 c0                   test   %eax,%eax
     e6d:       75 e6                   jne    e55 <pick_next_task_fair+0xc>
     e6f:       8d 53 d0                lea    -0x30(%ebx),%edx
     e72:       89 d0                   mov    %edx,%eax
     e74:       5b                      pop    %ebx
     e75:       c3                      ret


> 
> anyway, my guess is that it's :
> 
>                 se = pick_next_entity(cfs_rq);
>                 cfs_rq = group_cfs_rq(se);
> 
> 'se' _happens_ to be NULL and group_cf_rq(se) does se->my_q and
> (according to my calculations) offset(my_q) == 68 (0x44) for x86 32bit
> system with CONFIG_SCHEDSTATS=n and CONFIG_FAIR_GROUP_SCHED=y
> (according to the config).
> 
> that might take place provided put_prev_task_fair() failed for some
> reason to insert 'current' (or its corresponding group element) back
> into the tree in schedule()... say, due to some inconsistency in
> cfs_rq's data.
> 
> Srivatsa, that's somewhat similar to another issue that has been
> posted earlier today (crash in put_prev_task_fair() -->
> __enqueue_task() --> rb_insert_color()) that you are already aware of
> ...  (/me will continue tomorrow).
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-10-16 23:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-15 14:17 [git pull] scheduler updates for v2.6.24 Ingo Molnar
2007-10-15 15:04 ` Ingo Molnar
2007-10-15 18:35 ` Andrew Morton
2007-10-15 18:53   ` Ingo Molnar
2007-10-16 22:13     ` Gabriel C
2007-10-16 23:31       ` Dmitry Adamushko
2007-10-16 23:50         ` Gabriel C
2007-10-16 22:38     ` Dmitry Adamushko
2007-10-16  2:38 ` Nick Piggin
2007-10-16 10:04 ` Thomas Backlund
2007-10-16 10:08   ` Ingo Molnar
2007-10-16 10:12     ` Ingo Molnar
2007-10-16 11:00       ` Thomas Backlund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox