Re: [PATCH] smp_call_function_many SMP race

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Anton Blanchard <anton@samba.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>,
	Ingo Molnar <mingo@elte.hu>, Jens Axboe <jens.axboe@oracle.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Peter Zijlstra <peterz@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Milton Miller <miltonm@bga.com>, Nick Piggin <npiggin@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] smp_call_function_many SMP race
Date: Tue, 23 Mar 2010 09:41:24 -0700	[thread overview]
Message-ID: <20100323164124.GN2517@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100323111556.GK24064@kryten>

On Tue, Mar 23, 2010 at 10:15:56PM +1100, Anton Blanchard wrote:
> 
> I noticed a failure where we hit the following WARN_ON in
> generic_smp_call_function_interrupt:
> 
>                 if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
>                         continue;
> 
>                 data->csd.func(data->csd.info);
> 
>                 refs = atomic_dec_return(&data->refs);
>                 WARN_ON(refs < 0);      <-------------------------
> 
> We atomically tested and cleared our bit in the cpumask, and yet the number
> of cpus left (ie refs) was 0. How can this be?
> 
> It turns out commit c0f68c2fab4898bcc4671a8fb941f428856b4ad5 (generic-ipi:
> cleanup for generic_smp_call_function_interrupt()) is at fault. It removes
> locking from smp_call_function_many and in doing so creates a rather
> complicated race.
> 
> The problem comes about because:
> 
> - The smp_call_function_many interrupt handler walks call_function.queue
>   without any locking.
> - We reuse a percpu data structure in smp_call_function_many.
> - We do not wait for any RCU grace period before starting the next
>   smp_call_function_many.
> 
> Imagine a scenario where CPU A does two smp_call_functions back to back, and
> CPU B does an smp_call_function in between. We concentrate on how CPU C handles
> the calls:
> 
> 
> CPU A                  CPU B                  CPU C
> 
> smp_call_function
>                                               smp_call_function_interrupt
>                                                 walks call_function.queue
>                                                 sees CPU A on list
> 
>                          smp_call_function
> 
>                                               smp_call_function_interrupt
>                                                 walks call_function.queue
>                                                 sees (stale) CPU A on list
> smp_call_function
>   reuses percpu *data
>   set data->cpumask
>                                                 sees and clears bit in cpumask!
>                                                 sees data->refs is 0!
> 
>   set data->refs (too late!)
> 
> 
> The important thing to note is since the interrupt handler walks a potentially
> stale call_function.queue without any locking, then another cpu can view the
> percpu *data structure at any time, even when the owner is in the process
> of initialising it.
> 
> The following test case hits the WARN_ON 100% of the time on my PowerPC box
> (having 128 threads does help :)
> 
> 
> #include <linux/module.h>
> #include <linux/init.h>
> 
> #define ITERATIONS 100
> 
> static void do_nothing_ipi(void *dummy)
> {
> }
> 
> static void do_ipis(struct work_struct *dummy)
> {
> 	int i;
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		smp_call_function(do_nothing_ipi, NULL, 1);
> 
> 	printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
> }
> 
> static struct work_struct work[NR_CPUS];
> 
> static int __init testcase_init(void)
> {
> 	int cpu;
> 
> 	for_each_online_cpu(cpu) {
> 		INIT_WORK(&work[cpu], do_ipis);
> 		schedule_work_on(cpu, &work[cpu]);
> 	}
> 
> 	return 0;
> }
> 
> static void __exit testcase_exit(void)
> {
> }
> 
> module_init(testcase_init)
> module_exit(testcase_exit)
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Anton Blanchard");
> 
> 
> I tried to fix it by ordering the read and the write of ->cpumask and ->refs.
> In doing so I missed a critical case but Paul McKenney was able to spot
> my bug thankfully :) To ensure we arent viewing previous iterations the
> interrupt handler needs to read ->refs then ->cpumask then ->refs _again_.
> 
> Thanks to Milton Miller and Paul McKenney for helping to debug this issue.
> 
> ---
> 
> My head hurts. This needs some serious analysis before we can be sure it
> fixes all the races. With all these memory barriers, maybe the previous
> spinlocks weren't so bad after all :)

;-)

Does this patch appear to have fixed things, or do you still have a
failure rate?  In other words, should I be working on a proof of
(in)correctness, or should I be looking for further bugs?

							Thanx, Paul

> Index: linux-2.6/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/kernel/smp.c	2010-03-23 05:09:08.000000000 -0500
> +++ linux-2.6/kernel/smp.c	2010-03-23 06:12:40.000000000 -0500
> @@ -193,6 +193,31 @@ void generic_smp_call_function_interrupt
>  	list_for_each_entry_rcu(data, &call_function.queue, csd.list) {
>  		int refs;
> 
> +		/*
> +		 * Since we walk the list without any locks, we might
> +		 * see an entry that was completed, removed from the
> +		 * list and is in the process of being reused.
> +		 *
> +		 * Just checking data->refs then data->cpumask is not good
> +		 * enough because we could see a non zero data->refs from a
> +		 * previous iteration. We need to check data->refs, then
> +		 * data->cpumask then data->refs again. Talk about
> +		 * complicated!
> +		 */
> +
> +		if (atomic_read(&data->refs) == 0)
> +			continue;
> +
> +		smp_rmb();
> +
> +		if (!cpumask_test_cpu(cpu, data->cpumask))
> +			continue;
> +
> +		smp_rmb();
> +
> +		if (atomic_read(&data->refs) == 0)
> +			continue;
> +
>  		if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
>  			continue;
> 
> @@ -446,6 +471,14 @@ void smp_call_function_many(const struct
>  	data->csd.info = info;
>  	cpumask_and(data->cpumask, mask, cpu_online_mask);
>  	cpumask_clear_cpu(this_cpu, data->cpumask);
> +
> +	/*
> +	 * To ensure the interrupt handler gets an up to date view
> +	 * we order the cpumask and refs writes and order the
> +	 * read of them in the interrupt handler.
> +	 */
> +	smp_wmb();
> +
>  	atomic_set(&data->refs, cpumask_weight(data->cpumask));
> 
>  	raw_spin_lock_irqsave(&call_function.lock, flags);

next prev parent reply	other threads:[~2010-03-23 16:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-23 11:15 [PATCH] smp_call_function_many SMP race Anton Blanchard
2010-03-23 12:26 ` Peter Zijlstra
2010-03-23 15:33   ` Paul E. McKenney
2010-03-23 15:49     ` Peter Zijlstra
2010-03-23 21:31   ` Anton Blanchard
2010-03-23 16:41 ` Paul E. McKenney [this message]
2010-05-03 14:24 ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2011-01-12  4:07 Anton Blanchard
2011-01-17 18:17 ` Peter Zijlstra
2011-01-18 21:05   ` Milton Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100323164124.GN2517@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miltonm@bga.com \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    --cc=xiaoguangrong@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.