public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
@ 2008-06-02  8:33 Miao Xie
  0 siblings, 0 replies; 11+ messages in thread
From: Miao Xie @ 2008-06-02  8:33 UTC (permalink / raw)
  To: Andrew Morton, Linux-Kernel; +Cc: Paul Jackson, Paul Menage

The bug is, a task may run on the cpu/node which is not in its cpuset.cpus/
cpuset.mems.

It can be reproduced by the following commands:
-----------------------------------
# mkdir /dev/cpuset
# mount -t cpuset xxx /dev/cpuset
# mkdir /dev/cpuset/0
# echo 0-1 > /dev/cpuset/0/cpus
# echo 0 > /dev/cpuset/0/mems
# echo $$ > /dev/cpuset/0/tasks
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 1 > /sys/devices/system/cpu/cpu1/online
-----------------------------------

There is only CPU0 in cpuset.cpus, but the task in this cpuset runs on
both CPU0 and CPU1.

It is because the task's cpu_allowed didn't get updated after we did
CPU offline/online manipulation. Similar for mem_allowed.

This patch fixes this bug expect for root cpuset. Because there is a
problem about root cpuset, in that whether it is necessary to update all
the tasks in root cpuset or not after cpu/node offline/online.

If updating, some kernel threads which is bound into a specified cpu will
be unbound.

If not updating, there is a bug in root cpuset. This bug is also caused
by offline/online manipulation. For example, there is a dual-cpu
machine. we create a sub cpuset in root cpuset and assign 1 to its cpus.
And then we attach some tasks into this sub cpuset. After this, we
offline CPU1. Now, the tasks in this new cpuset are moved into root
cpuset automatically because there is no cpu in sub cpuset. Then we
online CPU1, we find all the tasks which doesn't belong to root cpuset
originally just run on CPU0.

Maybe we need to add a flag in the task_struct to mark which task can't be
unbound?

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: Paul Jackson <pj@sgi.com>

---
 kernel/cpuset.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index dc16f71..4cc012e 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1872,6 +1872,7 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
 	struct cpuset *child;	/* scans child cpusets of cp */
 	struct list_head queue;
 	struct cgroup *cont;
+	nodemask_t oldmems;
 
 	INIT_LIST_HEAD(&queue);
 
@@ -1891,6 +1892,8 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
 		    nodes_subset(cp->mems_allowed, node_states[N_HIGH_MEMORY]))
 			continue;
 
+		oldmems = cp->mems_allowed;
+
 		/* Remove offline cpus and mems from this cpuset. */
 		mutex_lock(&callback_mutex);
 		cpus_and(cp->cpus_allowed, cp->cpus_allowed, cpu_online_map);
@@ -1902,6 +1905,10 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
 		if (cpus_empty(cp->cpus_allowed) ||
 		     nodes_empty(cp->mems_allowed))
 			remove_tasks_in_empty_cpuset(cp);
+		else {
+			update_tasks_cpumask(cp);
+			update_tasks_nodemask(cp, &oldmems);
+		}
 	}
 }
 
-- 
1.5.4.rc3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
@ 2008-06-04  2:05 Miao Xie
  2008-06-04  9:30 ` Paul Menage
  0 siblings, 1 reply; 11+ messages in thread
From: Miao Xie @ 2008-06-04  2:05 UTC (permalink / raw)
  To: Andrew Morton, Linux-Kernel, Paul Jackson, Paul Menage

The bug is, a task may run on the cpu/node which is not in its cpuset.cpus/
cpuset.mems.

It can be reproduced by the following commands:
-----------------------------------
# mkdir /dev/cpuset
# mount -t cpuset xxx /dev/cpuset
# mkdir /dev/cpuset/0
# echo 0-1 > /dev/cpuset/0/cpus
# echo 0 > /dev/cpuset/0/mems
# echo $$ > /dev/cpuset/0/tasks
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 1 > /sys/devices/system/cpu/cpu1/online
-----------------------------------

There is only CPU0 in cpuset.cpus, but the task in this cpuset runs on
both CPU0 and CPU1.

It is because the task's cpu_allowed didn't get updated after we did
CPU offline/online manipulation. Similar for mem_allowed.

This patch fixes this bug expect for root cpuset. Because there is a
problem about root cpuset, in that whether it is necessary to update all
the tasks in root cpuset or not after cpu/node offline/online.

If updating, some kernel threads which is bound into a specified cpu will
be unbound.

If not updating, there is a bug in root cpuset. This bug is also caused
by offline/online manipulation. For example, there is a dual-cpu
machine. we create a sub cpuset in root cpuset and assign 1 to its cpus.
And then we attach some tasks into this sub cpuset. After this, we
offline CPU1. Now, the tasks in this new cpuset are moved into root
cpuset automatically because there is no cpu in sub cpuset. Then we
online CPU1, we find all the tasks which doesn't belong to root cpuset
originally just run on CPU0.

Maybe we need to add a flag in the task_struct to mark which task can't be
unbound?

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: Paul Jackson <pj@sgi.com>

---
 kernel/cpuset.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index aa955ee..ea7fdb9 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1876,6 +1876,7 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
 	struct cpuset *child;	/* scans child cpusets of cp */
 	struct list_head queue;
 	struct cgroup *cont;
+	nodemask_t oldmems;
 
 	INIT_LIST_HEAD(&queue);
 
@@ -1895,6 +1896,8 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
 		    nodes_subset(cp->mems_allowed, node_states[N_HIGH_MEMORY]))
 			continue;
 
+		oldmems = cp->mems_allowed;
+
 		/* Remove offline cpus and mems from this cpuset. */
 		mutex_lock(&callback_mutex);
 		cpus_and(cp->cpus_allowed, cp->cpus_allowed, cpu_online_map);
@@ -1906,6 +1909,10 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
 		if (cpus_empty(cp->cpus_allowed) ||
 		     nodes_empty(cp->mems_allowed))
 			remove_tasks_in_empty_cpuset(cp);
+		else {
+			update_tasks_cpumask(cp);
+			update_tasks_nodemask(cp, &oldmems);
+		}
 	}
 }
 
-- 
1.5.4.rc3



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04  2:05 Miao Xie
@ 2008-06-04  9:30 ` Paul Menage
  2008-06-04  9:58   ` Paul Menage
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Menage @ 2008-06-04  9:30 UTC (permalink / raw)
  To: miaox; +Cc: Andrew Morton, Linux-Kernel, Paul Jackson

On Tue, Jun 3, 2008 at 7:05 PM, Miao Xie <miaox@cn.fujitsu.com> wrote:
>
> Maybe we need to add a flag in the task_struct to mark which task can't be
> unbound?

Yes, something like a PF_CPU_BOUND flag that gets set in
kthread_bind() and checked for in set_cpus_allowed() would be the
right way to handle that. We have an internal change like that against
an older kernel verison - I'll see if I can find someone to forward
port it and send it in.

Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04  9:30 ` Paul Menage
@ 2008-06-04  9:58   ` Paul Menage
  2008-06-04 17:22     ` Paul Jackson
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Menage @ 2008-06-04  9:58 UTC (permalink / raw)
  To: miaox; +Cc: Andrew Morton, Linux-Kernel, Paul Jackson, David Rientjes

On Wed, Jun 4, 2008 at 2:30 AM, Paul Menage <menage@google.com> wrote:
> On Tue, Jun 3, 2008 at 7:05 PM, Miao Xie <miaox@cn.fujitsu.com> wrote:
>>
>> Maybe we need to add a flag in the task_struct to mark which task can't be
>> unbound?
>
> Yes, something like a PF_CPU_BOUND flag that gets set in
> kthread_bind() and checked for in set_cpus_allowed() would be the
> right way to handle that. We have an internal change like that against
> an older kernel verison - I'll see if I can find someone to forward
> port it and send it in.

And in fact they already did, and sent it to lkml a few months ago:

   http://marc.info/?l=linux-kernel&m=120419372032623

Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04  9:58   ` Paul Menage
@ 2008-06-04 17:22     ` Paul Jackson
  2008-06-04 19:01       ` David Rientjes
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Jackson @ 2008-06-04 17:22 UTC (permalink / raw)
  To: Paul Menage; +Cc: miaox, akpm, linux-kernel, rientjes

Paul M, quoting Miao Xie <miaox@cn.fujitsu.com>:
>>
>> Maybe we need to add a flag in the task_struct to mark which task can't be
>> unbound?

Do we need a new PF_* flag for this?  Perhaps one can test for this
by examining the currently available properties of tasks.  Would it
be sufficient to look for kernel threads (NULL mm_struct) whose
cpus_allowed is a strict subset of the online CPUs?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04 17:22     ` Paul Jackson
@ 2008-06-04 19:01       ` David Rientjes
  2008-06-04 19:25         ` Paul Jackson
  0 siblings, 1 reply; 11+ messages in thread
From: David Rientjes @ 2008-06-04 19:01 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Paul Menage, miaox, akpm, linux-kernel

On Wed, 4 Jun 2008, Paul Jackson wrote:

> Do we need a new PF_* flag for this?  Perhaps one can test for this
> by examining the currently available properties of tasks.  Would it
> be sufficient to look for kernel threads (NULL mm_struct) whose
> cpus_allowed is a strict subset of the online CPUs?
> 

That would only identify kthreads that have been created with a subsequent 
call to set_cpus_allowed() or kthread_bind().

The PF_CPU_BOUND change targets only the latter since there are kthreads, 
such as kstopmachine, that can continue to manipulate their cpus_allowed 
during their lifetime.

Other kthreads such as the scheduler migration thread and soft lockup 
watchdog thread, however, always stay bound to a single cpu by use of 
kthread_bind().  These are the tasks that get the PF_CPU_BOUND flag and 
cannot be rebound via set_cpus_allowed() because of their negative 
effects.

		David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04 19:01       ` David Rientjes
@ 2008-06-04 19:25         ` Paul Jackson
  2008-06-04 19:33           ` David Rientjes
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Jackson @ 2008-06-04 19:25 UTC (permalink / raw)
  To: David Rientjes; +Cc: menage, miaox, akpm, linux-kernel

David wrote:
> That would only identify kthreads that have been created with a subsequent 
> call to set_cpus_allowed() or kthread_bind().
> 
> The PF_CPU_BOUND change targets only the latter since there are kthreads, 
> such as kstopmachine, that can continue to manipulate their cpus_allowed 
> during their lifetime.

Would your first sentence be more clearly written as:

> That would identify both kthreads that have been created with a subsequent 
> call to set_cpus_allowed() or kthread_bind().

Or do I misunderstand?

If I am reading you correctly, then would it work to have a check in the
cpuset code (rather than in the lower set_cpus_allowed() routine),
where that check refused to move tasks out of the root cpuset if they
were (1) kernel threads (mm NULL) and (2) had cpus_allowed that were a
strict subset of the root cpusets 'cpus' (the online cpus).

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04 19:25         ` Paul Jackson
@ 2008-06-04 19:33           ` David Rientjes
  2008-06-04 19:42             ` Paul Jackson
  0 siblings, 1 reply; 11+ messages in thread
From: David Rientjes @ 2008-06-04 19:33 UTC (permalink / raw)
  To: Paul Jackson; +Cc: menage, miaox, akpm, linux-kernel

On Wed, 4 Jun 2008, Paul Jackson wrote:

> > That would identify both kthreads that have been created with a subsequent 
> > call to set_cpus_allowed() or kthread_bind().
> 
> Or do I misunderstand?
> 
> If I am reading you correctly, then would it work to have a check in the
> cpuset code (rather than in the lower set_cpus_allowed() routine),
> where that check refused to move tasks out of the root cpuset if they
> were (1) kernel threads (mm NULL) and (2) had cpus_allowed that were a
> strict subset of the root cpusets 'cpus' (the online cpus).
> 

No, because sched_setaffinity() can still move the threads.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04 19:33           ` David Rientjes
@ 2008-06-04 19:42             ` Paul Jackson
  2008-06-04 20:02               ` David Rientjes
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Jackson @ 2008-06-04 19:42 UTC (permalink / raw)
  To: David Rientjes; +Cc: menage, miaox, akpm, linux-kernel

David wrote:
> No, because sched_setaffinity() can still move the threads.

Good point.

How about also adding this check (NULL mm and struct subset cpus)
to sched_setaffinity?

Granted, this pair of checks, in cpusets and sched_setaffinity,
is getting to be a little more work than the PF_* flag.

I guess one question would be how hard we want to work to avoid
consuming another PF_* flag.  I thought they were scarce, but I
might be wrong.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04 19:42             ` Paul Jackson
@ 2008-06-04 20:02               ` David Rientjes
  2008-06-04 20:21                 ` Paul Jackson
  0 siblings, 1 reply; 11+ messages in thread
From: David Rientjes @ 2008-06-04 20:02 UTC (permalink / raw)
  To: Paul Jackson; +Cc: menage, miaox, akpm, linux-kernel

On Wed, 4 Jun 2008, Paul Jackson wrote:

> How about also adding this check (NULL mm and struct subset cpus)
> to sched_setaffinity?
> 
> Granted, this pair of checks, in cpusets and sched_setaffinity,
> is getting to be a little more work than the PF_* flag.
> 

Yes, an alterative to the PF_CPU_BIND flag is to prevent calls to 
set_cpus_allowed() for kthreads that should not change affinity.

> I guess one question would be how hard we want to work to avoid
> consuming another PF_* flag.  I thought they were scarce, but I
> might be wrong.
> 

I think we both agree that in terms of the code itself, adding the flag 
and doing the check in set_cpus_allowed() is the optimal solution.  It can 
simply return -EINVAL for PF_CPU_BIND threads and the threads don't get 
moved.

It's debatable whether allocating an additional PF_* flag for this purpose 
is worthwhile.

I usually don't advocate against adding such bits that have a clear and 
defined purpose for the sole reason that perhaps later we will need 
additional bits that can't be worked around so easily as this one.  It's 
helpful to be conservative in allocating new flags but not at the expense 
of the code itself, especially when loopholes can easily be introduced 
that would have been otherwise prevented.

So I'll repost the patch and see where it goes.

		David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online
  2008-06-04 20:02               ` David Rientjes
@ 2008-06-04 20:21                 ` Paul Jackson
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Jackson @ 2008-06-04 20:21 UTC (permalink / raw)
  To: David Rientjes; +Cc: menage, miaox, akpm, linux-kernel

David wrote:
> So I'll repost the patch and see where it goes.

Ok.  Perhaps you could add a brief comment, noting that
the additional PF_* flag could perhaps be avoided with
checks in cpusets and sched_setaffinity, as explained in
http://lkml.org/lkml/2008/6/4/388.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-06-04 20:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-02  8:33 [PATCH 2/2] cpusets: update tasks' cpus_allowed and mems_allowed after CPU/NODE offline/online Miao Xie
  -- strict thread matches above, loose matches on Subject: below --
2008-06-04  2:05 Miao Xie
2008-06-04  9:30 ` Paul Menage
2008-06-04  9:58   ` Paul Menage
2008-06-04 17:22     ` Paul Jackson
2008-06-04 19:01       ` David Rientjes
2008-06-04 19:25         ` Paul Jackson
2008-06-04 19:33           ` David Rientjes
2008-06-04 19:42             ` Paul Jackson
2008-06-04 20:02               ` David Rientjes
2008-06-04 20:21                 ` Paul Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox