possible migration bug with hotplug cpu

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* possible migration bug with hotplug cpu
@ 2009-07-08 15:48 Lucas De Marchi
  2009-07-08 15:55 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-08 15:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

I was doing some analysis with the number of migrations in my application and
I think there's a bug in this accounting or even worse, in the migrations
mechanism when used together with cpu hotplug.

I turned off all CPUs except one using the hotplug mechanism, after what I
launghed my application that has 8 threads. Before they finish they print the
file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
migrations per thread. The function set_task_cpu is responsible for updating
the migrations counter and is called by 9 other functions. With some tests I
discovered that 95% of these migrations come from try_to_wake_up and the other
5% from pull_task and __migrate_task.

Looking at try_to_wake_up:

...
	cpu = task_cpu(p);
	orig_cpu = cpu;
	this_cpu = smp_processor_id();

#ifdef CONFIG_SMP
	if (unlikely(task_running(rq, p)))
		goto out_activate;

	cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
	if (cpu != orig_cpu) {                          //<<<<===
		set_task_cpu(p, cpu);
...

p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
similar for rt and normal tasks. For RT, the only possible problem could be on
find_lowest_rq, but I'm still rying to find out why. Since you have more
experience with this code, if you could give it a look I'd appreciate.

Is there any obscure reason why this behavior could be right?

Lucas De Marchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 15:48 possible migration bug with hotplug cpu Lucas De Marchi
@ 2009-07-08 15:55 ` Peter Zijlstra
  2009-07-08 16:05   ` Lucas De Marchi
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-07-08 15:55 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Ingo Molnar, linux-kernel

On Wed, 2009-07-08 at 17:48 +0200, Lucas De Marchi wrote:
> I was doing some analysis with the number of migrations in my application and
> I think there's a bug in this accounting or even worse, in the migrations
> mechanism when used together with cpu hotplug.
> 
> I turned off all CPUs except one using the hotplug mechanism, after what I
> launghed my application that has 8 threads. Before they finish they print the
> file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
> migrations per thread. The function set_task_cpu is responsible for updating
> the migrations counter and is called by 9 other functions. With some tests I
> discovered that 95% of these migrations come from try_to_wake_up and the other
> 5% from pull_task and __migrate_task.
> 
> Looking at try_to_wake_up:
> 
> ....
> 	cpu = task_cpu(p);
> 	orig_cpu = cpu;
> 	this_cpu = smp_processor_id();
> 
> #ifdef CONFIG_SMP
> 	if (unlikely(task_running(rq, p)))
> 		goto out_activate;
> 
> 	cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
> 	if (cpu != orig_cpu) {                          //<<<<===
> 		set_task_cpu(p, cpu);
> ....
> 
> p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
> task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
> similar for rt and normal tasks. For RT, the only possible problem could be on
> find_lowest_rq, but I'm still rying to find out why. Since you have more
> experience with this code, if you could give it a look I'd appreciate.
> 
> Is there any obscure reason why this behavior could be right?

If the task last ran on a now unplugged cpu this would be correct, is
this indeed what happens?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 15:55 ` Peter Zijlstra
@ 2009-07-08 16:05   ` Lucas De Marchi
  2009-07-08 16:39     ` Lucas De Marchi
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-08 16:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

No, because the tasks are executed only after the CPUs become offline.

Lucas De Marchi


On Wed, Jul 8, 2009 at 17:55, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, 2009-07-08 at 17:48 +0200, Lucas De Marchi wrote:
> > I was doing some analysis with the number of migrations in my application and
> > I think there's a bug in this accounting or even worse, in the migrations
> > mechanism when used together with cpu hotplug.
> >
> > I turned off all CPUs except one using the hotplug mechanism, after what I
> > launghed my application that has 8 threads. Before they finish they print the
> > file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
> > migrations per thread. The function set_task_cpu is responsible for updating
> > the migrations counter and is called by 9 other functions. With some tests I
> > discovered that 95% of these migrations come from try_to_wake_up and the other
> > 5% from pull_task and __migrate_task.
> >
> > Looking at try_to_wake_up:
> >
> > ....
> >       cpu = task_cpu(p);
> >       orig_cpu = cpu;
> >       this_cpu = smp_processor_id();
> >
> > #ifdef CONFIG_SMP
> >       if (unlikely(task_running(rq, p)))
> >               goto out_activate;
> >
> >       cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
> >       if (cpu != orig_cpu) {                          //<<<<===
> >               set_task_cpu(p, cpu);
> > ....
> >
> > p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
> > task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
> > similar for rt and normal tasks. For RT, the only possible problem could be on
> > find_lowest_rq, but I'm still rying to find out why. Since you have more
> > experience with this code, if you could give it a look I'd appreciate.
> >
> > Is there any obscure reason why this behavior could be right?
>
> If the task last ran on a now unplugged cpu this would be correct, is
> this indeed what happens?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 16:05   ` Lucas De Marchi
@ 2009-07-08 16:39     ` Lucas De Marchi
  2009-07-09 11:57       ` Lucas De Marchi
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-08 16:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

Following a piece of /proc/<pid>/sched, for RT tasks, running with only one
processor online:

se.wait_count                      :                 1466
sched_info.bkl_count               :                    0
se.nr_migrations                   :                  289  <<=========
se.nr_migrations_cold              :                    0
se.nr_failed_migrations_affine     :                    0
se.nr_failed_migrations_running    :                    7
se.nr_failed_migrations_hot        :                    3
se.nr_forced_migrations            :                    1
se.nr_forced2_migrations           :                   86
se.nr_wakeups                      :               151347
se.nr_wakeups_sync                 :                  298
se.nr_wakeups_migrate              :                  265
se.nr_wakeups_local                :               150516
se.nr_wakeups_remote               :                  831
se.nr_wakeups_affine               :                  253
se.nr_wakeups_affine_attempts      :                 1092
se.nr_wakeups_passive              :                    8
se.nr_wakeups_idle                 :                    0
avg_atom                           :             0.002887
avg_per_cpu                        :             1.498609
nr_switches                        :               150001
nr_voluntary_switches              :               150001
nr_involuntary_switches            :                    0
se.load.weight                     :               177522
policy                             :                    1
prio                               :                   89  <<=========
clock-delta                        :                   84


At  http://pastebin.com/pastebin.php?dl=m7c226875  there's the
/proc/sched_debug before and after running the test.


Lucas De Marchi


On Wed, Jul 8, 2009 at 18:05, Lucas De Marchi <lucas.de.marchi@gmail.com> wrote:
>
> No, because the tasks are executed only after the CPUs become offline.
>
> Lucas De Marchi
>
>
> On Wed, Jul 8, 2009 at 17:55, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, 2009-07-08 at 17:48 +0200, Lucas De Marchi wrote:
> > > I was doing some analysis with the number of migrations in my application and
> > > I think there's a bug in this accounting or even worse, in the migrations
> > > mechanism when used together with cpu hotplug.
> > >
> > > I turned off all CPUs except one using the hotplug mechanism, after what I
> > > launghed my application that has 8 threads. Before they finish they print the
> > > file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
> > > migrations per thread. The function set_task_cpu is responsible for updating
> > > the migrations counter and is called by 9 other functions. With some tests I
> > > discovered that 95% of these migrations come from try_to_wake_up and the other
> > > 5% from pull_task and __migrate_task.
> > >
> > > Looking at try_to_wake_up:
> > >
> > > ....
> > >       cpu = task_cpu(p);
> > >       orig_cpu = cpu;
> > >       this_cpu = smp_processor_id();
> > >
> > > #ifdef CONFIG_SMP
> > >       if (unlikely(task_running(rq, p)))
> > >               goto out_activate;
> > >
> > >       cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
> > >       if (cpu != orig_cpu) {                          //<<<<===
> > >               set_task_cpu(p, cpu);
> > > ....
> > >
> > > p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
> > > task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
> > > similar for rt and normal tasks. For RT, the only possible problem could be on
> > > find_lowest_rq, but I'm still rying to find out why. Since you have more
> > > experience with this code, if you could give it a look I'd appreciate.
> > >
> > > Is there any obscure reason why this behavior could be right?
> >
> > If the task last ran on a now unplugged cpu this would be correct, is
> > this indeed what happens?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 16:39     ` Lucas De Marchi
@ 2009-07-09 11:57       ` Lucas De Marchi
  2009-07-09 12:24         ` Peter Zijlstra
  2009-07-10 10:41         ` [tip:sched/urgent] sched: Reset sched stats on fork() tip-bot for Lucas De Marchi
  0 siblings, 2 replies; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-09 11:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

So, I found the problem.

These fields are currently not initialized upon fork. I noted that when I
updated to 2.6.31-rc2; commit 6c594c21fcb02c662f11c97be4d7d2b73060a205 was
merged into kernel by Ingo (not present yet in 2.6.30), but it only initializes
nr_migrations. Why the other fields are not initialized to 0? Even when there
are more processors, these fields may be wrong if not zeroed when a new task
is started. Below the fast way to fix it. This fixed the counters for me.

What do you think of creating a struct sched_statistics embedded into
sched_entity so we coudl memset it to zero all at once? All fields of
SCHED_STATS piece should be initialized, right?

Lucas De Marchi


diff --git a/kernel/sched.c b/kernel/sched.c
index fd3ac58..b8d75cf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2572,15 +2572,37 @@ static void __sched_fork(struct task_struct *p)
 	p->se.avg_wakeup		= sysctl_sched_wakeup_granularity;

 #ifdef CONFIG_SCHEDSTATS
-	p->se.wait_start		= 0;
-	p->se.sum_sleep_runtime		= 0;
-	p->se.sleep_start		= 0;
-	p->se.block_start		= 0;
-	p->se.sleep_max			= 0;
-	p->se.block_max			= 0;
-	p->se.exec_max			= 0;
-	p->se.slice_max			= 0;
-	p->se.wait_max			= 0;
+	p->se.wait_start			= 0;
+	p->se.wait_max				= 0;
+	p->se.wait_count			= 0;
+	p->se.wait_sum				= 0;
+
+	p->se.sleep_start			= 0;
+	p->se.sleep_max				= 0;
+	p->se.sum_sleep_runtime			= 0;
+
+	p->se.block_start			= 0;
+	p->se.block_max				= 0;
+	p->se.exec_max				= 0;
+	p->se.slice_max				= 0;
+
+	p->se.nr_migrations_cold		= 0;
+	p->se.nr_failed_migrations_affine	= 0;
+	p->se.nr_failed_migrations_running	= 0;
+	p->se.nr_failed_migrations_hot		= 0;
+	p->se.nr_forced_migrations		= 0;
+	p->se.nr_forced2_migrations		= 0;
+
+	p->se.nr_wakeups			= 0;
+	p->se.nr_wakeups_sync			= 0;
+	p->se.nr_wakeups_migrate		= 0;
+	p->se.nr_wakeups_local			= 0;
+	p->se.nr_wakeups_remote			= 0;
+	p->se.nr_wakeups_affine			= 0;
+	p->se.nr_wakeups_affine_attempts	= 0;
+	p->se.nr_wakeups_passive		= 0;
+	p->se.nr_wakeups_idle			= 0;
+
 #endif

 	INIT_LIST_HEAD(&p->rt.run_list);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-09 11:57       ` Lucas De Marchi
@ 2009-07-09 12:24         ` Peter Zijlstra
  2009-07-09 12:55           ` Lucas De Marchi
  2009-07-10 10:41         ` [tip:sched/urgent] sched: Reset sched stats on fork() tip-bot for Lucas De Marchi
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-07-09 12:24 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Ingo Molnar, linux-kernel

On Thu, 2009-07-09 at 13:57 +0200, Lucas De Marchi wrote:
> So, I found the problem.
> 
> These fields are currently not initialized upon fork. I noted that when I
> updated to 2.6.31-rc2; commit 6c594c21fcb02c662f11c97be4d7d2b73060a205 was
> merged into kernel by Ingo (not present yet in 2.6.30), but it only initializes
> nr_migrations. Why the other fields are not initialized to 0? Even when there
> are more processors, these fields may be wrong if not zeroed when a new task
> is started. Below the fast way to fix it. This fixed the counters for me.

Ah, awesome. I hadn't had time to look in detail yet. Thanks!

> What do you think of creating a struct sched_statistics embedded into
> sched_entity so we coudl memset it to zero all at once? All fields of
> SCHED_STATS piece should be initialized, right?

Sounds like a sane plan.

One of the things I have wanted to do for a long time is to rename
sched_entity to sched_fair_entity, and do something like:

struct sched_entity {
	struct sched_common_entity       common;

	union {
		struct sched_fair_entity fair;
		struct sched_rt_entity   rt;
	};
};

I imagine we can add struct sched_statistics to the end of this as well.

The reason I haven't come around to doing this is that it takes a bit of
time to refactor the code and find all the common bits. So its been on
my todo list like forever.

Can I persuade you to look at this? :-)

I'll queue the below, can I add:

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>

?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-09 12:24         ` Peter Zijlstra
@ 2009-07-09 12:55           ` Lucas De Marchi
  2009-07-09 13:15             ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-09 12:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Thu, Jul 9, 2009 at 14:24, Peter Zijlstra<peterz@infradead.org> wrote:
> One of the things I have wanted to do for a long time is to rename
> sched_entity to sched_fair_entity, and do something like:
>
> struct sched_entity {
>        struct sched_common_entity       common;
>
>        union {
>                struct sched_fair_entity fair;
>                struct sched_rt_entity   rt;
>        };
> };
>
> I imagine we can add struct sched_statistics to the end of this as well.
>
> The reason I haven't come around to doing this is that it takes a bit of
> time to refactor the code and find all the common bits. So its been on
> my todo list like forever.
>
> Can I persuade you to look at this? :-)

It looks like a very good idea. As you told, it will take a time to refactor
the code. I can look at this, but only on the end of next week.

> I'll queue the below, can I add:
>
> Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
>
> ?

Sure. I'm not very habituated yet with it... do I re-send the email
or you add it by yourself?

thanks

Lucas De Marchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-09 12:55           ` Lucas De Marchi
@ 2009-07-09 13:15             ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2009-07-09 13:15 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Ingo Molnar, linux-kernel

On Thu, 2009-07-09 at 14:55 +0200, Lucas De Marchi wrote:
> On Thu, Jul 9, 2009 at 14:24, Peter Zijlstra<peterz@infradead.org> wrote:
> > One of the things I have wanted to do for a long time is to rename
> > sched_entity to sched_fair_entity, and do something like:
> >
> > struct sched_entity {
> >        struct sched_common_entity       common;
> >
> >        union {
> >                struct sched_fair_entity fair;
> >                struct sched_rt_entity   rt;
> >        };
> > };
> >
> > I imagine we can add struct sched_statistics to the end of this as well.
> >
> > The reason I haven't come around to doing this is that it takes a bit of
> > time to refactor the code and find all the common bits. So its been on
> > my todo list like forever.
> >
> > Can I persuade you to look at this? :-)
> 
> It looks like a very good idea. As you told, it will take a time to refactor
> the code. I can look at this, but only on the end of next week.

Sure no hurry, its been without that for like forever, so a few more
weeks won't harm anybody, thanks!

> > I'll queue the below, can I add:
> >
> > Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
> >
> > ?
> 
> Sure. I'm not very habituated yet with it... do I re-send the email
> or you add it by yourself?

I've added it, simply remember to add that for future patches :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:sched/urgent] sched: Reset sched stats on fork()
  2009-07-09 11:57       ` Lucas De Marchi
  2009-07-09 12:24         ` Peter Zijlstra
@ 2009-07-10 10:41         ` tip-bot for Lucas De Marchi
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Lucas De Marchi @ 2009-07-10 10:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, lucas.de.marchi, tglx,
	mingo

Commit-ID:  7793527b90d9418211f4fe8464cc1dcb1631ea1b
Gitweb:     http://git.kernel.org/tip/7793527b90d9418211f4fe8464cc1dcb1631ea1b
Author:     Lucas De Marchi <lucas.de.marchi@gmail.com>
AuthorDate: Thu, 9 Jul 2009 13:57:20 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 10 Jul 2009 10:43:29 +0200

sched: Reset sched stats on fork()

The sched_stat fields are currently not reset upon fork.
Ingo's recent commit 6c594c21fcb02c662f11c97be4d7d2b73060a205
did reset nr_migrations, but it didn't reset any of the
others.

This patch resets all sched_stat fields on fork.

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <193b0f820907090457s7a3662f4gcdecdc22fcae857b@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/sched.c |   40 +++++++++++++++++++++++++++++++---------
 1 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index a17f3d9..c4549bd 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2572,15 +2572,37 @@ static void __sched_fork(struct task_struct *p)
 	p->se.avg_wakeup		= sysctl_sched_wakeup_granularity;
 
 #ifdef CONFIG_SCHEDSTATS
-	p->se.wait_start		= 0;
-	p->se.sum_sleep_runtime		= 0;
-	p->se.sleep_start		= 0;
-	p->se.block_start		= 0;
-	p->se.sleep_max			= 0;
-	p->se.block_max			= 0;
-	p->se.exec_max			= 0;
-	p->se.slice_max			= 0;
-	p->se.wait_max			= 0;
+	p->se.wait_start			= 0;
+	p->se.wait_max				= 0;
+	p->se.wait_count			= 0;
+	p->se.wait_sum				= 0;
+
+	p->se.sleep_start			= 0;
+	p->se.sleep_max				= 0;
+	p->se.sum_sleep_runtime			= 0;
+
+	p->se.block_start			= 0;
+	p->se.block_max				= 0;
+	p->se.exec_max				= 0;
+	p->se.slice_max				= 0;
+
+	p->se.nr_migrations_cold		= 0;
+	p->se.nr_failed_migrations_affine	= 0;
+	p->se.nr_failed_migrations_running	= 0;
+	p->se.nr_failed_migrations_hot		= 0;
+	p->se.nr_forced_migrations		= 0;
+	p->se.nr_forced2_migrations		= 0;
+
+	p->se.nr_wakeups			= 0;
+	p->se.nr_wakeups_sync			= 0;
+	p->se.nr_wakeups_migrate		= 0;
+	p->se.nr_wakeups_local			= 0;
+	p->se.nr_wakeups_remote			= 0;
+	p->se.nr_wakeups_affine			= 0;
+	p->se.nr_wakeups_affine_attempts	= 0;
+	p->se.nr_wakeups_passive		= 0;
+	p->se.nr_wakeups_idle			= 0;
+
 #endif
 
 	INIT_LIST_HEAD(&p->rt.run_list);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-07-10 10:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-08 15:48 possible migration bug with hotplug cpu Lucas De Marchi
2009-07-08 15:55 ` Peter Zijlstra
2009-07-08 16:05   ` Lucas De Marchi
2009-07-08 16:39     ` Lucas De Marchi
2009-07-09 11:57       ` Lucas De Marchi
2009-07-09 12:24         ` Peter Zijlstra
2009-07-09 12:55           ` Lucas De Marchi
2009-07-09 13:15             ` Peter Zijlstra
2009-07-10 10:41         ` [tip:sched/urgent] sched: Reset sched stats on fork() tip-bot for Lucas De Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox