Re: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Lee Schermerhorn <lee.schermerhorn@hp.com>
Subject: Re: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)
Date: Mon, 22 Feb 2010 23:06:05 +1100	[thread overview]
Message-ID: <20100222120605.GU9738@laptop> (raw)
In-Reply-To: <4B827043.3060305@cn.fujitsu.com>

On Mon, Feb 22, 2010 at 07:53:39PM +0800, Miao Xie wrote:
> I'm sorry for replying this late.

No problem.

 
> on 2010-2-19 18:06, David Rientjes wrote:
> > On Fri, 19 Feb 2010, Nick Piggin wrote:
> > 
> >>> guarantee_online_cpus() truly does require callback_mutex, the 
> >>> cgroup_scan_tasks() iterator locking can protect changes in the cgroup 
> >>> hierarchy but it doesn't protect a store to cs->cpus_allowed or for 
> >>> hotplug.
> >>
> >> Right, but the callback_mutex was being removed by this patch.
> >>
> > 
> > I was making the case for it to be readded :)
> 
> But cgroup_mutex is held when someone changes cs->cpus_allowed or doing hotplug,
> so I think callback_mutex is not necessary in this case.

So long as that's done consistently (and we should update the comments
too).


> >>> top_cpuset.cpus_allowed will always need to track cpu_active_map since 
> >>> those are the schedulable cpus, it looks like that's initialized for SMP 
> >>> and the cpu hotplug notifier does that correctly.
> >>>
> >>> I'm not sure what the logic is doing in cpuset_attach() where cs is the 
> >>> cpuset to attach to:
> >>>
> >>> 	if (cs == &top_cpuset) {
> >>> 		cpumask_copy(cpus_attach, cpu_possible_mask);
> >>> 		to = node_possible_map;
> >>> 	}
> >>>
> >>> cpus_attach is properly protected by cgroup_lock, but using 
> >>> node_possible_map here will set task->mems_allowed to node_possible_map 
> >>> when the cpuset does not have memory_migrate enabled.  This is the source 
> >>> of your oops, I think.
> >>
> >> Could be, yes.
> >>
> > 
> > I'd be interested to see if you still get the same oops with the patch at 
> > the end of this email that fixes this logic.
> 
> I think this patch can't fix this bug, because mems_allowed of tasks in the
> top group is set to node_possible_map by default, not when the task is 
> attached.
> 
> I made a new patch at the end of this email to fix it, but I have no machine
> to test it now. who can test it for me.

I can test it for you, but I wonder about your barriers.

> 
> ---
> diff --git a/init/main.c b/init/main.c
> index 4cb47a1..512ba15 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -846,7 +846,7 @@ static int __init kernel_init(void * unused)
>  	/*
>  	 * init can allocate pages on any node
>  	 */
> -	set_mems_allowed(node_possible_map);
> +	set_mems_allowed(node_states[N_HIGH_MEMORY]);
>  	/*
>  	 * init can run on any cpu.
>  	 */
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index ba401fa..e29b440 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -935,10 +935,12 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
>  	struct task_struct *tsk = current;
>  
>  	tsk->mems_allowed = *to;
> +	wmb();
>  
>  	do_migrate_pages(mm, from, to, MPOL_MF_MOVE_ALL);
>  
>  	guarantee_online_mems(task_cs(tsk),&tsk->mems_allowed);
> +	wmb();
>  }
>  
>  /*

You always need to comment barriers (and use smp_ variants unless you're
doing mmio).


> @@ -1391,11 +1393,10 @@ static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
>  
>  	if (cs == &top_cpuset) {
>  		cpumask_copy(cpus_attach, cpu_possible_mask);
> -		to = node_possible_map;
>  	} else {
>  		guarantee_online_cpus(cs, cpus_attach);
> -		guarantee_online_mems(cs, &to);
>  	}
> +	guarantee_online_mems(cs, &to);
>  
>  	/* do per-task migration stuff possibly for each in the threadgroup */
>  	cpuset_attach_task(tsk, &to, cs);
> @@ -2090,15 +2091,19 @@ static int cpuset_track_online_cpus(struct notifier_block *unused_nb,
>  static int cpuset_track_online_nodes(struct notifier_block *self,
>  				unsigned long action, void *arg)
>  {
> +	nodemask_t oldmems;
> +
>  	cgroup_lock();
>  	switch (action) {
>  	case MEM_ONLINE:
> -	case MEM_OFFLINE:
> +		oldmems = top_cpuset.mems_allowed;
>  		mutex_lock(&callback_mutex);
>  		top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
>  		mutex_unlock(&callback_mutex);
> -		if (action == MEM_OFFLINE)
> -			scan_for_empty_cpusets(&top_cpuset);
> +		update_tasks_nodemask(&top_cpuset, &oldmems, NULL);
> +		break;
> +	case MEM_OFFLINE:
> +		scan_for_empty_cpusets(&top_cpuset);
>  		break;
>  	default:
>  		break;
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index fbb6222..84c7f99 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -219,7 +219,7 @@ int kthreadd(void *unused)
>  	set_task_comm(tsk, "kthreadd");
>  	ignore_signals(tsk);
>  	set_cpus_allowed_ptr(tsk, cpu_all_mask);
> -	set_mems_allowed(node_possible_map);
> +	set_mems_allowed(node_states[N_HIGH_MEMORY]);
>  
>  	current->flags |= PF_NOFREEZE | PF_FREEZER_NOSIG;
>

WARNING: multiple messages have this Message-ID (diff)

From: Nick Piggin <npiggin@suse.de>
To: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Lee Schermerhorn <lee.schermerhorn@hp.com>
Subject: Re: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)
Date: Mon, 22 Feb 2010 23:06:05 +1100	[thread overview]
Message-ID: <20100222120605.GU9738@laptop> (raw)
In-Reply-To: <4B827043.3060305@cn.fujitsu.com>

On Mon, Feb 22, 2010 at 07:53:39PM +0800, Miao Xie wrote:
> I'm sorry for replying this late.

No problem.

 
> on 2010-2-19 18:06, David Rientjes wrote:
> > On Fri, 19 Feb 2010, Nick Piggin wrote:
> > 
> >>> guarantee_online_cpus() truly does require callback_mutex, the 
> >>> cgroup_scan_tasks() iterator locking can protect changes in the cgroup 
> >>> hierarchy but it doesn't protect a store to cs->cpus_allowed or for 
> >>> hotplug.
> >>
> >> Right, but the callback_mutex was being removed by this patch.
> >>
> > 
> > I was making the case for it to be readded :)
> 
> But cgroup_mutex is held when someone changes cs->cpus_allowed or doing hotplug,
> so I think callback_mutex is not necessary in this case.

So long as that's done consistently (and we should update the comments
too).


> >>> top_cpuset.cpus_allowed will always need to track cpu_active_map since 
> >>> those are the schedulable cpus, it looks like that's initialized for SMP 
> >>> and the cpu hotplug notifier does that correctly.
> >>>
> >>> I'm not sure what the logic is doing in cpuset_attach() where cs is the 
> >>> cpuset to attach to:
> >>>
> >>> 	if (cs == &top_cpuset) {
> >>> 		cpumask_copy(cpus_attach, cpu_possible_mask);
> >>> 		to = node_possible_map;
> >>> 	}
> >>>
> >>> cpus_attach is properly protected by cgroup_lock, but using 
> >>> node_possible_map here will set task->mems_allowed to node_possible_map 
> >>> when the cpuset does not have memory_migrate enabled.  This is the source 
> >>> of your oops, I think.
> >>
> >> Could be, yes.
> >>
> > 
> > I'd be interested to see if you still get the same oops with the patch at 
> > the end of this email that fixes this logic.
> 
> I think this patch can't fix this bug, because mems_allowed of tasks in the
> top group is set to node_possible_map by default, not when the task is 
> attached.
> 
> I made a new patch at the end of this email to fix it, but I have no machine
> to test it now. who can test it for me.

I can test it for you, but I wonder about your barriers.

> 
> ---
> diff --git a/init/main.c b/init/main.c
> index 4cb47a1..512ba15 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -846,7 +846,7 @@ static int __init kernel_init(void * unused)
>  	/*
>  	 * init can allocate pages on any node
>  	 */
> -	set_mems_allowed(node_possible_map);
> +	set_mems_allowed(node_states[N_HIGH_MEMORY]);
>  	/*
>  	 * init can run on any cpu.
>  	 */
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index ba401fa..e29b440 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -935,10 +935,12 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
>  	struct task_struct *tsk = current;
>  
>  	tsk->mems_allowed = *to;
> +	wmb();
>  
>  	do_migrate_pages(mm, from, to, MPOL_MF_MOVE_ALL);
>  
>  	guarantee_online_mems(task_cs(tsk),&tsk->mems_allowed);
> +	wmb();
>  }
>  
>  /*

You always need to comment barriers (and use smp_ variants unless you're
doing mmio).


> @@ -1391,11 +1393,10 @@ static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
>  
>  	if (cs == &top_cpuset) {
>  		cpumask_copy(cpus_attach, cpu_possible_mask);
> -		to = node_possible_map;
>  	} else {
>  		guarantee_online_cpus(cs, cpus_attach);
> -		guarantee_online_mems(cs, &to);
>  	}
> +	guarantee_online_mems(cs, &to);
>  
>  	/* do per-task migration stuff possibly for each in the threadgroup */
>  	cpuset_attach_task(tsk, &to, cs);
> @@ -2090,15 +2091,19 @@ static int cpuset_track_online_cpus(struct notifier_block *unused_nb,
>  static int cpuset_track_online_nodes(struct notifier_block *self,
>  				unsigned long action, void *arg)
>  {
> +	nodemask_t oldmems;
> +
>  	cgroup_lock();
>  	switch (action) {
>  	case MEM_ONLINE:
> -	case MEM_OFFLINE:
> +		oldmems = top_cpuset.mems_allowed;
>  		mutex_lock(&callback_mutex);
>  		top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
>  		mutex_unlock(&callback_mutex);
> -		if (action == MEM_OFFLINE)
> -			scan_for_empty_cpusets(&top_cpuset);
> +		update_tasks_nodemask(&top_cpuset, &oldmems, NULL);
> +		break;
> +	case MEM_OFFLINE:
> +		scan_for_empty_cpusets(&top_cpuset);
>  		break;
>  	default:
>  		break;
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index fbb6222..84c7f99 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -219,7 +219,7 @@ int kthreadd(void *unused)
>  	set_task_comm(tsk, "kthreadd");
>  	ignore_signals(tsk);
>  	set_cpus_allowed_ptr(tsk, cpu_all_mask);
> -	set_mems_allowed(node_possible_map);
> +	set_mems_allowed(node_states[N_HIGH_MEMORY]);
>  
>  	current->flags |= PF_NOFREEZE | PF_FREEZER_NOSIG;
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-02-22 12:06 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-18 13:49 [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2) Nick Piggin
2010-02-18 13:49 ` Nick Piggin
2010-02-18 21:38 ` David Rientjes
2010-02-18 21:38   ` David Rientjes
2010-02-19  3:31   ` Nick Piggin
2010-02-19  3:31     ` Nick Piggin
2010-02-19 10:06     ` David Rientjes
2010-02-19 10:06       ` David Rientjes
2010-02-22 11:53       ` Miao Xie
2010-02-22 11:53         ` Miao Xie
2010-02-22 12:06         ` Nick Piggin [this message]
2010-02-22 12:06           ` Nick Piggin
2010-02-23  1:48           ` Miao Xie
2010-02-23  1:48             ` Miao Xie
2010-02-22 22:06         ` David Rientjes
2010-02-22 22:06           ` David Rientjes
2010-02-23  7:32           ` Miao Xie
2010-02-23  7:32             ` Miao Xie
2010-02-23  8:55             ` David Rientjes
2010-02-23  8:55               ` David Rientjes
2010-02-23  9:23               ` Miao Xie
2010-02-23  9:23                 ` Miao Xie
2010-02-23 22:31                 ` David Rientjes
2010-02-23 22:31                   ` David Rientjes
2010-02-24  9:35                   ` Miao Xie
2010-02-24  9:35                     ` Miao Xie
2010-02-24 21:08                     ` David Rientjes
2010-02-24 21:08                       ` David Rientjes
2010-02-25  1:18                       ` Miao Xie
2010-02-25  1:18                         ` Miao Xie
2010-02-22 12:12       ` Nick Piggin
2010-02-22 12:12         ` Nick Piggin
2010-02-22 22:00         ` David Rientjes
2010-02-22 22:00           ` David Rientjes
2010-02-23  8:25           ` Miao Xie
2010-02-23  8:25             ` Miao Xie
2010-02-23  8:44             ` David Rientjes
2010-02-23  8:44               ` David Rientjes
2010-02-24  9:49               ` Miao Xie
2010-02-24  9:49                 ` Miao Xie
2010-02-24 21:06                 ` David Rientjes
2010-02-24 21:06                   ` David Rientjes
2010-02-19  7:51 ` KOSAKI Motohiro
2010-02-19  7:51   ` KOSAKI Motohiro
2010-02-19  9:42   ` David Rientjes
2010-02-19  9:42     ` David Rientjes
2010-02-19  7:56 ` KAMEZAWA Hiroyuki
2010-02-19  7:56   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100222120605.GU9738@laptop \
    --to=npiggin@suse.de \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miaox@cn.fujitsu.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.