Re: [PATCH] smp_call_function_many SMP race

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Anton Blanchard <anton@samba.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>,
	Ingo Molnar <mingo@elte.hu>, Jens Axboe <jens.axboe@oracle.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Peter Zijlstra <peterz@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Milton Miller <miltonm@bga.com>, Nick Piggin <npiggin@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] smp_call_function_many SMP race
Date: Tue, 23 Mar 2010 09:41:24 -0700	[thread overview]
Message-ID: <20100323164124.GN2517@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100323111556.GK24064@kryten>

On Tue, Mar 23, 2010 at 10:15:56PM +1100, Anton Blanchard wrote:
> 
> I noticed a failure where we hit the following WARN_ON in
> generic_smp_call_function_interrupt:
> 
>                 if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
>                         continue;
> 
>                 data->csd.func(data->csd.info);
> 
>                 refs = atomic_dec_return(&data->refs);
>                 WARN_ON(refs < 0);      <-------------------------
> 
> We atomically tested and cleared our bit in the cpumask, and yet the number
> of cpus left (ie refs) was 0. How can this be?
> 
> It turns out commit c0f68c2fab4898bcc4671a8fb941f428856b4ad5 (generic-ipi:
> cleanup for generic_smp_call_function_interrupt()) is at fault. It removes
> locking from smp_call_function_many and in doing so creates a rather
> complicated race.
> 
> The problem comes about because:
> 
> - The smp_call_function_many interrupt handler walks call_function.queue
>   without any locking.
> - We reuse a percpu data structure in smp_call_function_many.
> - We do not wait for any RCU grace period before starting the next
>   smp_call_function_many.
> 
> Imagine a scenario where CPU A does two smp_call_functions back to back, and
> CPU B does an smp_call_function in between. We concentrate on how CPU C handles
> the calls:
> 
> 
> CPU A                  CPU B                  CPU C
> 
> smp_call_function
>                                               smp_call_function_interrupt
>                                                 walks call_function.queue
>                                                 sees CPU A on list
> 
>                          smp_call_function
> 
>                                               smp_call_function_interrupt
>                                                 walks call_function.queue
>                                                 sees (stale) CPU A on list
> smp_call_function
>   reuses percpu *data
>   set data->cpumask
>                                                 sees and clears bit in cpumask!
>                                                 sees data->refs is 0!
> 
>   set data->refs (too late!)
> 
> 
> The important thing to note is since the interrupt handler walks a potentially
> stale call_function.queue without any locking, then another cpu can view the
> percpu *data structure at any time, even when the owner is in the process
> of initialising it.
> 
> The following test case hits the WARN_ON 100% of the time on my PowerPC box
> (having 128 threads does help :)
> 
> 
> #include <linux/module.h>
> #include <linux/init.h>
> 
> #define ITERATIONS 100
> 
> static void do_nothing_ipi(void *dummy)
> {
> }
> 
> static void do_ipis(struct work_struct *dummy)
> {
> 	int i;
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		smp_call_function(do_nothing_ipi, NULL, 1);
> 
> 	printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
> }
> 
> static struct work_struct work[NR_CPUS];
> 
> static int __init testcase_init(void)
> {
> 	int cpu;
> 
> 	for_each_online_cpu(cpu) {
> 		INIT_WORK(&work[cpu], do_ipis);
> 		schedule_work_on(cpu, &work[cpu]);
> 	}
> 
> 	return 0;
> }
> 
> static void __exit testcase_exit(void)
> {
> }
> 
> module_init(testcase_init)
> module_exit(testcase_exit)
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Anton Blanchard");
> 
> 
> I tried to fix it by ordering the read and the write of ->cpumask and ->refs.
> In doing so I missed a critical case but Paul McKenney was able to spot
> my bug thankfully :) To ensure we arent viewing previous iterations the
> interrupt handler needs to read ->refs then ->cpumask then ->refs _again_.
> 
> Thanks to Milton Miller and Paul McKenney for helping to debug this issue.
> 
> ---
> 
> My head hurts. This needs some serious analysis before we can be sure it
> fixes all the races. With all these memory barriers, maybe the previous
> spinlocks weren't so bad after all :)

;-)

Does this patch appear to have fixed things, or do you still have a
failure rate?  In other words, should I be working on a proof of
(in)correctness, or should I be looking for further bugs?

							Thanx, Paul

> Index: linux-2.6/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/kernel/smp.c	2010-03-23 05:09:08.000000000 -0500
> +++ linux-2.6/kernel/smp.c	2010-03-23 06:12:40.000000000 -0500
> @@ -193,6 +193,31 @@ void generic_smp_call_function_interrupt
>  	list_for_each_entry_rcu(data, &call_function.queue, csd.list) {
>  		int refs;
> 
> +		/*
> +		 * Since we walk the list without any locks, we might
> +		 * see an entry that was completed, removed from the
> +		 * list and is in the process of being reused.
> +		 *
> +		 * Just checking data->refs then data->cpumask is not good
> +		 * enough because we could see a non zero data->refs from a
> +		 * previous iteration. We need to check data->refs, then
> +		 * data->cpumask then data->refs again. Talk about
> +		 * complicated!
> +		 */
> +
> +		if (atomic_read(&data->refs) == 0)
> +			continue;
> +
> +		smp_rmb();
> +
> +		if (!cpumask_test_cpu(cpu, data->cpumask))
> +			continue;
> +
> +		smp_rmb();
> +
> +		if (atomic_read(&data->refs) == 0)
> +			continue;
> +
>  		if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
>  			continue;
> 
> @@ -446,6 +471,14 @@ void smp_call_function_many(const struct
>  	data->csd.info = info;
>  	cpumask_and(data->cpumask, mask, cpu_online_mask);
>  	cpumask_clear_cpu(this_cpu, data->cpumask);
> +
> +	/*
> +	 * To ensure the interrupt handler gets an up to date view
> +	 * we order the cpumask and refs writes and order the
> +	 * read of them in the interrupt handler.
> +	 */
> +	smp_wmb();
> +
>  	atomic_set(&data->refs, cpumask_weight(data->cpumask));
> 
>  	raw_spin_lock_irqsave(&call_function.lock, flags);

next prev parent reply	other threads:[~2010-03-23 16:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-23 11:15 [PATCH] smp_call_function_many SMP race Anton Blanchard
2010-03-23 12:26 ` Peter Zijlstra
2010-03-23 15:33   ` Paul E. McKenney
2010-03-23 15:49     ` Peter Zijlstra
2010-03-23 21:31   ` Anton Blanchard
2010-03-23 16:41 ` Paul E. McKenney [this message]
2010-05-03 14:24 ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2011-01-12  4:07 Anton Blanchard
2011-01-17 18:17 ` Peter Zijlstra
2011-01-18 21:05   ` Milton Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100323164124.GN2517@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miltonm@bga.com \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    --cc=xiaoguangrong@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).