Re: Bisected rcu hang (kernel/sched.c): was 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd - reproducible.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Michael Breuer <mbreuer@majjas.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org
Subject: Re: Bisected rcu hang (kernel/sched.c): was 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd - reproducible.
Date: Mon, 25 Jan 2010 11:14:17 -0500	[thread overview]
Message-ID: <4B5DC359.8080107@majjas.com> (raw)
In-Reply-To: <1264435420.4283.1927.camel@laptop>

On 1/25/2010 11:03 AM, Peter Zijlstra wrote:
> On Sat, 2010-01-23 at 21:49 -0500, Michael Breuer wrote:
>    
>> Libvirtd always triggers the crash; other things that fork and use mmap
>> sometimes do (vsftpd, for example).
>>      
> I bet its the nscgroup trainwreck, does this fix it?
>
> ---
> commit fabf318e5e4bda0aca2b0d617b191884fda62703
> Author: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Date:   Thu Jan 21 21:04:57 2010 +0100
>
>      sched: Fix fork vs hotplug vs cpuset namespaces
>
>      There are a number of issues:
>
>      1) TASK_WAKING vs cgroup_clone (cpusets)
>
>      copy_process():
>
>        sched_fork()
>          child->state = TASK_WAKING; /* waiting for wake_up_new_task() */
>        if (current->nsproxy != p->nsproxy)
>           ns_cgroup_clone()
>             cgroup_clone()
>               mutex_lock(inode->i_mutex)
>               mutex_lock(cgroup_mutex)
>               cgroup_attach_task()
>      	   ss->can_attach()
>                 ss->attach() [ ->  cpuset_attach() ]
>                   cpuset_attach_task()
>                     set_cpus_allowed_ptr();
>                       while (child->state == TASK_WAKING)
>                         cpu_relax();
>      will deadlock the system.
>
>
>      2) cgroup_clone (cpusets) vs copy_process
>
>      So even if the above would work we still have:
>
>      copy_process():
>
>        if (current->nsproxy != p->nsproxy)
>           ns_cgroup_clone()
>             cgroup_clone()
>               mutex_lock(inode->i_mutex)
>               mutex_lock(cgroup_mutex)
>               cgroup_attach_task()
>      	   ss->can_attach()
>                 ss->attach() [ ->  cpuset_attach() ]
>                   cpuset_attach_task()
>                     set_cpus_allowed_ptr();
>        ...
>
>        p->cpus_allowed = current->cpus_allowed
>
>      over-writing the modified cpus_allowed.
>
>
>      3) fork() vs hotplug
>
>        if we unplug the child's cpu after the sanity check when the child
>        gets attached to the task_list but before wake_up_new_task() shit
>        will meet with fan.
>
>      Solve all these issues by moving fork cpu selection into
>      wake_up_new_task().
>
>      Reported-by: Serge E. Hallyn<serue@us.ibm.com>
>      Tested-by: Serge E. Hallyn<serue@us.ibm.com>
>      Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
>      LKML-Reference:<1264106190.4283.1314.camel@laptop>
>      Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 5b2959b..f88bd98 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1241,21 +1241,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>   	/* Need tasklist lock for parent etc handling! */
>   	write_lock_irq(&tasklist_lock);
>
> -	/*
> -	 * The task hasn't been attached yet, so its cpus_allowed mask will
> -	 * not be changed, nor will its assigned CPU.
> -	 *
> -	 * The cpus_allowed mask of the parent may have changed after it was
> -	 * copied first time - so re-copy it here, then check the child's CPU
> -	 * to ensure it is on a valid CPU (and if not, just force it back to
> -	 * parent's CPU). This avoids alot of nasty races.
> -	 */
> -	p->cpus_allowed = current->cpus_allowed;
> -	p->rt.nr_cpus_allowed = current->rt.nr_cpus_allowed;
> -	if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed) ||
> -			!cpu_online(task_cpu(p))))
> -		set_task_cpu(p, smp_processor_id());
> -
>   	/* CLONE_PARENT re-uses the old parent */
>   	if (clone_flags&  (CLONE_PARENT|CLONE_THREAD)) {
>   		p->real_parent = current->real_parent;
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 4508fe7..3a8fb30 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2320,14 +2320,12 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>   }
>
>   /*
> - * Called from:
> + * Gets called from 3 sites (exec, fork, wakeup), since it is called without
> + * holding rq->lock we need to ensure ->cpus_allowed is stable, this is done
> + * by:
>    *
> - *  - fork, @p is stable because it isn't on the tasklist yet
> - *
> - *  - exec, @p is unstable, retry loop
> - *
> - *  - wake-up, we serialize ->cpus_allowed against TASK_WAKING so
> - *             we should be good.
> + *  exec:           is unstable, retry loop
> + *  fork&  wake-up: serialize ->cpus_allowed against TASK_WAKING
>    */
>   static inline
>   int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
> @@ -2620,9 +2618,6 @@ void sched_fork(struct task_struct *p, int clone_flags)
>   	if (p->sched_class->task_fork)
>   		p->sched_class->task_fork(p);
>
> -#ifdef CONFIG_SMP
> -	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
> -#endif
>   	set_task_cpu(p, cpu);
>
>   #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> @@ -2652,6 +2647,21 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
>   {
>   	unsigned long flags;
>   	struct rq *rq;
> +	int cpu = get_cpu();
> +
> +#ifdef CONFIG_SMP
> +	/*
> +	 * Fork balancing, do it here and not earlier because:
> +	 *  - cpus_allowed can change in the fork path
> +	 *  - any previously selected cpu might disappear through hotplug
> +	 *
> +	 * We still have TASK_WAKING but PF_STARTING is gone now, meaning
> +	 * ->cpus_allowed is stable, we have preemption disabled, meaning
> +	 * cpu_online_mask is stable.
> +	 */
> +	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
> +	set_task_cpu(p, cpu);
> +#endif
>
>   	rq = task_rq_lock(p,&flags);
>   	BUG_ON(p->state != TASK_WAKING);
> @@ -2665,6 +2675,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
>   		p->sched_class->task_woken(rq, p);
>   #endif
>   	task_rq_unlock(rq,&flags);
> +	put_cpu();
>   }
>
>   #ifdef CONFIG_PREEMPT_NOTIFIERS
> @@ -7139,14 +7150,18 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>   	 * the ->cpus_allowed mask from under waking tasks, which would be
>   	 * possible when we change rq->lock in ttwu(), so synchronize against
>   	 * TASK_WAKING to avoid that.
> +	 *
> +	 * Make an exception for freshly cloned tasks, since cpuset namespaces
> +	 * might move the task about, we have to validate the target in
> +	 * wake_up_new_task() anyway since the cpu might have gone away.
>   	 */
>   again:
> -	while (p->state == TASK_WAKING)
> +	while (p->state == TASK_WAKING&&  !(p->flags&  PF_STARTING))
>   		cpu_relax();
>
>   	rq = task_rq_lock(p,&flags);
>
> -	if (p->state == TASK_WAKING) {
> +	if (p->state == TASK_WAKING&&  !(p->flags&  PF_STARTING)) {
>   		task_rq_unlock(rq,&flags);
>   		goto again;
>   	}
>
>    
Yes this solved it. Applied yesterday after Mike Galbraith sent me the 
same patch from tip.

     prev parent reply	other threads:[~2010-01-25 16:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-09 22:21 2.6.33RC3 Sky2 oops - Driver tries to sync DMA memory it has not allocated Michael Breuer
2010-01-10 20:10 ` 2.6.33RC3 libvirtd ->sky2 & rcu oops (was Sky2 oops - Driver tries to sync DMA memory it has not allocated) Michael Breuer
2010-01-12  1:49   ` Paul E. McKenney
2010-01-13 18:43     ` 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd - reproducible Michael Breuer
2010-01-13 18:58       ` Paul E. McKenney
2010-01-24  2:49       ` Bisected rcu hang (kernel/sched.c): was " Michael Breuer
2010-01-24  5:59         ` Mike Galbraith
2010-01-24  6:32           ` Michael Breuer
2010-01-24  7:19             ` Mike Galbraith
2010-01-25 16:03         ` Peter Zijlstra
2010-01-25 16:14           ` Michael Breuer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B5DC359.8080107@majjas.com \
    --to=mbreuer@majjas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox