Re: Bisected rcu hang (kernel/sched.c): was 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd - reproducible.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michael Breuer <mbreuer@majjas.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org
Subject: Re: Bisected rcu hang (kernel/sched.c): was 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd - reproducible.
Date: Mon, 25 Jan 2010 11:14:17 -0500	[thread overview]
Message-ID: <4B5DC359.8080107@majjas.com> (raw)
In-Reply-To: <1264435420.4283.1927.camel@laptop>

On 1/25/2010 11:03 AM, Peter Zijlstra wrote:
> On Sat, 2010-01-23 at 21:49 -0500, Michael Breuer wrote:
>    
>> Libvirtd always triggers the crash; other things that fork and use mmap
>> sometimes do (vsftpd, for example).
>>      
> I bet its the nscgroup trainwreck, does this fix it?
>
> ---
> commit fabf318e5e4bda0aca2b0d617b191884fda62703
> Author: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Date:   Thu Jan 21 21:04:57 2010 +0100
>
>      sched: Fix fork vs hotplug vs cpuset namespaces
>
>      There are a number of issues:
>
>      1) TASK_WAKING vs cgroup_clone (cpusets)
>
>      copy_process():
>
>        sched_fork()
>          child->state = TASK_WAKING; /* waiting for wake_up_new_task() */
>        if (current->nsproxy != p->nsproxy)
>           ns_cgroup_clone()
>             cgroup_clone()
>               mutex_lock(inode->i_mutex)
>               mutex_lock(cgroup_mutex)
>               cgroup_attach_task()
>      	   ss->can_attach()
>                 ss->attach() [ ->  cpuset_attach() ]
>                   cpuset_attach_task()
>                     set_cpus_allowed_ptr();
>                       while (child->state == TASK_WAKING)
>                         cpu_relax();
>      will deadlock the system.
>
>
>      2) cgroup_clone (cpusets) vs copy_process
>
>      So even if the above would work we still have:
>
>      copy_process():
>
>        if (current->nsproxy != p->nsproxy)
>           ns_cgroup_clone()
>             cgroup_clone()
>               mutex_lock(inode->i_mutex)
>               mutex_lock(cgroup_mutex)
>               cgroup_attach_task()
>      	   ss->can_attach()
>                 ss->attach() [ ->  cpuset_attach() ]
>                   cpuset_attach_task()
>                     set_cpus_allowed_ptr();
>        ...
>
>        p->cpus_allowed = current->cpus_allowed
>
>      over-writing the modified cpus_allowed.
>
>
>      3) fork() vs hotplug
>
>        if we unplug the child's cpu after the sanity check when the child
>        gets attached to the task_list but before wake_up_new_task() shit
>        will meet with fan.
>
>      Solve all these issues by moving fork cpu selection into
>      wake_up_new_task().
>
>      Reported-by: Serge E. Hallyn<serue@us.ibm.com>
>      Tested-by: Serge E. Hallyn<serue@us.ibm.com>
>      Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
>      LKML-Reference:<1264106190.4283.1314.camel@laptop>
>      Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 5b2959b..f88bd98 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1241,21 +1241,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>   	/* Need tasklist lock for parent etc handling! */
>   	write_lock_irq(&tasklist_lock);
>
> -	/*
> -	 * The task hasn't been attached yet, so its cpus_allowed mask will
> -	 * not be changed, nor will its assigned CPU.
> -	 *
> -	 * The cpus_allowed mask of the parent may have changed after it was
> -	 * copied first time - so re-copy it here, then check the child's CPU
> -	 * to ensure it is on a valid CPU (and if not, just force it back to
> -	 * parent's CPU). This avoids alot of nasty races.
> -	 */
> -	p->cpus_allowed = current->cpus_allowed;
> -	p->rt.nr_cpus_allowed = current->rt.nr_cpus_allowed;
> -	if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed) ||
> -			!cpu_online(task_cpu(p))))
> -		set_task_cpu(p, smp_processor_id());
> -
>   	/* CLONE_PARENT re-uses the old parent */
>   	if (clone_flags&  (CLONE_PARENT|CLONE_THREAD)) {
>   		p->real_parent = current->real_parent;
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 4508fe7..3a8fb30 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2320,14 +2320,12 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>   }
>
>   /*
> - * Called from:
> + * Gets called from 3 sites (exec, fork, wakeup), since it is called without
> + * holding rq->lock we need to ensure ->cpus_allowed is stable, this is done
> + * by:
>    *
> - *  - fork, @p is stable because it isn't on the tasklist yet
> - *
> - *  - exec, @p is unstable, retry loop
> - *
> - *  - wake-up, we serialize ->cpus_allowed against TASK_WAKING so
> - *             we should be good.
> + *  exec:           is unstable, retry loop
> + *  fork&  wake-up: serialize ->cpus_allowed against TASK_WAKING
>    */
>   static inline
>   int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
> @@ -2620,9 +2618,6 @@ void sched_fork(struct task_struct *p, int clone_flags)
>   	if (p->sched_class->task_fork)
>   		p->sched_class->task_fork(p);
>
> -#ifdef CONFIG_SMP
> -	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
> -#endif
>   	set_task_cpu(p, cpu);
>
>   #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> @@ -2652,6 +2647,21 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
>   {
>   	unsigned long flags;
>   	struct rq *rq;
> +	int cpu = get_cpu();
> +
> +#ifdef CONFIG_SMP
> +	/*
> +	 * Fork balancing, do it here and not earlier because:
> +	 *  - cpus_allowed can change in the fork path
> +	 *  - any previously selected cpu might disappear through hotplug
> +	 *
> +	 * We still have TASK_WAKING but PF_STARTING is gone now, meaning
> +	 * ->cpus_allowed is stable, we have preemption disabled, meaning
> +	 * cpu_online_mask is stable.
> +	 */
> +	cpu = select_task_rq(p, SD_BALANCE_FORK, 0);
> +	set_task_cpu(p, cpu);
> +#endif
>
>   	rq = task_rq_lock(p,&flags);
>   	BUG_ON(p->state != TASK_WAKING);
> @@ -2665,6 +2675,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
>   		p->sched_class->task_woken(rq, p);
>   #endif
>   	task_rq_unlock(rq,&flags);
> +	put_cpu();
>   }
>
>   #ifdef CONFIG_PREEMPT_NOTIFIERS
> @@ -7139,14 +7150,18 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>   	 * the ->cpus_allowed mask from under waking tasks, which would be
>   	 * possible when we change rq->lock in ttwu(), so synchronize against
>   	 * TASK_WAKING to avoid that.
> +	 *
> +	 * Make an exception for freshly cloned tasks, since cpuset namespaces
> +	 * might move the task about, we have to validate the target in
> +	 * wake_up_new_task() anyway since the cpu might have gone away.
>   	 */
>   again:
> -	while (p->state == TASK_WAKING)
> +	while (p->state == TASK_WAKING&&  !(p->flags&  PF_STARTING))
>   		cpu_relax();
>
>   	rq = task_rq_lock(p,&flags);
>
> -	if (p->state == TASK_WAKING) {
> +	if (p->state == TASK_WAKING&&  !(p->flags&  PF_STARTING)) {
>   		task_rq_unlock(rq,&flags);
>   		goto again;
>   	}
>
>    
Yes this solved it. Applied yesterday after Mike Galbraith sent me the 
same patch from tip.

     prev parent reply	other threads:[~2010-01-25 16:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-09 22:21 2.6.33RC3 Sky2 oops - Driver tries to sync DMA memory it has not allocated Michael Breuer
2010-01-10 20:10 ` 2.6.33RC3 libvirtd ->sky2 & rcu oops (was Sky2 oops - Driver tries to sync DMA memory it has not allocated) Michael Breuer
2010-01-12  1:49   ` Paul E. McKenney
2010-01-13 18:43     ` 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd - reproducible Michael Breuer
2010-01-13 18:58       ` Paul E. McKenney
2010-01-24  2:49       ` Bisected rcu hang (kernel/sched.c): was " Michael Breuer
2010-01-24  5:59         ` Mike Galbraith
2010-01-24  6:32           ` Michael Breuer
2010-01-24  7:19             ` Mike Galbraith
2010-01-25 16:03         ` Peter Zijlstra
2010-01-25 16:14           ` Michael Breuer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B5DC359.8080107@majjas.com \
    --to=mbreuer@majjas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.