public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
	mingo@elte.hu, tglx@linutronix.de, peterz@infradead.org,
	arjan@infradead.org, rusty@rustcorp.com.au,
	Jens Axboe <jens.axboe@oracle.com>
Subject: Re: Buggy IPI and MTRR code on low memory
Date: Wed, 28 Jan 2009 13:12:02 -0800	[thread overview]
Message-ID: <20090128131202.21757da6.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.DEB.1.10.0901281029150.25359@gandalf.stny.rr.com>

On Wed, 28 Jan 2009 11:38:14 -0500 (EST)
Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> While developing the RT git tree I came across this deadlock.
> 
> To avoid touching the memory allocator in smp_call_function_many I forced 
> the stack use case, the path that would be taken if data fails to 
> allocate.
> 
> Here's the current code in kernel/smp.c:
> 
> void smp_call_function_many(const struct cpumask *mask,
>                             void (*func)(void *), void *info,
>                             bool wait)
> {
>         struct call_function_data *data;
> [...]
>         data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC);
>         if (unlikely(!data)) {
>                 /* Slow path. */
>                 for_each_online_cpu(cpu) {
>                         if (cpu == smp_processor_id())
>                                 continue;
>                         if (cpumask_test_cpu(cpu, mask))
>                                 smp_call_function_single(cpu, func, info, 
> wait);
>                 }
>                 return;
>         }
> [...]
> 
> int smp_call_function_single(int cpu, void (*func) (void *info), void 
> *info,
>                              int wait)
> {
>         struct call_single_data d;
> [...]
>                 if (!wait) {
>                         data = kmalloc(sizeof(*data), GFP_ATOMIC);
>                         if (data)
>                                 data->flags = CSD_FLAG_ALLOC;
>                 }
>                 if (!data) {
>                         data = &d;
>                         data->flags = CSD_FLAG_WAIT;
>                 }
> 
> Note that if data failed to allocate, we force the wait state.
> 
> 
> This immediately caused a deadlock with the mtrr code:
> 
>  arch/x86/kernel/cpu/mtrr/main.c:
> 
> static void set_mtrr(unsigned int reg, unsigned long base,
>                      unsigned long size, mtrr_type type)
> {
>         struct set_mtrr_data data;
> [...]
>         /*  Start the ball rolling on other CPUs  */
>         if (smp_call_function(ipi_handler, &data, 0) != 0)
>                 panic("mtrr: timed out waiting for other CPUs\n");
> 
>         local_irq_save(flags);
> 
>         while(atomic_read(&data.count))
>                 cpu_relax();
> 
>         /* ok, reset count and toggle gate */
>         atomic_set(&data.count, num_booting_cpus() - 1);
>         smp_wmb();
>         atomic_set(&data.gate,1);
> 
> [...]
> 
> static void ipi_handler(void *info)
> /*  [SUMMARY] Synchronisation handler. Executed by "other" CPUs.
>     [RETURNS] Nothing.
> */
> {
> #ifdef CONFIG_SMP
>         struct set_mtrr_data *data = info;
>         unsigned long flags;
> 
>         local_irq_save(flags);
> 
>         atomic_dec(&data->count);
>         while(!atomic_read(&data->gate))
>                 cpu_relax();
> 
> 
> The problem is that if we use the stack, then we must wait for the 
> function to finish. But in the mtrr code, the called functions are waiting 
> for the caller to do something after the smp_call_function. Thus we 
> deadlock! This mtrr code seems to have been there for a while. At least 
> longer than the git history.

My initial reaction is that the mtrr code is being stupid, but I guess
that strengthening the smp_call_function() stuff is good, and we _do_
have this "wait=0" contract.

> To get around this, I did the following hack. Now this may be good 
> enough to handle the case. I'm posting it for comments.
> 
> The patch creates another flag called CSD_FLAG_RELEASE. If we fail
> to alloc the data and the wait bit is not set, we still use the stack
> but we also set this flag instead of the wait flag. The receiving IPI 
> will copy the data locally, and if this flag is set, it will clear it. The 
> caller, after sending the IPI, will wait on this flag to be cleared.
> 
> The difference between this and the wait bit is that the release bit is 
> just a way to let the callee tell the caller that it copied the data and 
> is continuing. The data can be released with no worries. This prevents the 
> deadlock because the caller can continue without waiting for the functions 
> to be called.
> 
> I tested this patch by forcing the data to be null:
> 
> 	data = NULL; // kmalloc(...);
> 
> Also, when forcing data to be NULL on the latest git tree, without 
> applying the patch, I hit a deadlock in testing of the NMI watchdog. This 
> means there may be other areas in the kernel that think smp_call_function, 
> without the wait bit set, expects that function not to ever wait.

Concern 1: do all architectures actually call
generic_smp_call_function_single_interrupt()?  I don't think they
_have_ to at present, and if they don't, we now have inconsistent
behaviour between architectures.

Concern 2: not all architectures set CONFIG_USE_GENERIC_SMP_HELPERS=y. 
Those which do not set CONFIG_USE_GENERIC_SMP_HELPERS might need to
have similar changes made so that the behaviour remains consistent
across architectures.

Thought: do we need to do the kmalloc at all?  Perhaps we can instead
use a statically allocated per-cpu call_single_data local to
kernel/smp.c?  It would need a spinlock or something to protect it...



  parent reply	other threads:[~2009-01-28 21:13 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-28 16:38 Buggy IPI and MTRR code on low memory Steven Rostedt
2009-01-28 16:41 ` Steven Rostedt
2009-01-28 16:46 ` Peter Zijlstra
2009-01-28 16:56   ` Steven Rostedt
2009-01-28 17:00     ` Peter Zijlstra
2009-01-28 17:24   ` Steven Rostedt
2009-01-28 18:20     ` Peter Zijlstra
2009-01-28 18:52       ` Steven Rostedt
2009-01-28 18:22     ` Arjan van de Ven
2009-01-28 18:34       ` Steven Rostedt
2009-01-28 21:12 ` Andrew Morton [this message]
2009-01-28 21:13   ` Andrew Morton
2009-01-28 21:23     ` Steven Rostedt
2009-01-28 22:07       ` Andrew Morton
2009-01-28 22:47         ` Steven Rostedt
2009-01-28 23:20           ` Andrew Morton
2009-01-28 23:50             ` Steven Rostedt
2009-01-28 23:25 ` Rusty Russell
2009-01-28 23:41   ` Steven Rostedt
2009-01-29  0:52   ` [PATCH] use per cpu data for single cpu ipi calls Steven Rostedt
2009-01-29  1:30     ` Andrew Morton
2009-01-29  1:56       ` Steven Rostedt
2009-01-29  8:49       ` Peter Zijlstra
2009-01-29 11:13         ` Ingo Molnar
2009-01-29 11:41           ` Peter Zijlstra
2009-01-29 13:42             ` Ingo Molnar
2009-01-29 14:07             ` Steven Rostedt
2009-01-29 15:08         ` [PATCH -v2] " Steven Rostedt
2009-01-29 15:33           ` Peter Zijlstra
2009-01-29 16:17             ` Ingo Molnar
2009-01-29 17:21           ` Linus Torvalds
2009-01-29 17:44             ` Steven Rostedt
2009-01-29 17:50               ` Steven Rostedt
2009-01-29 18:08               ` Linus Torvalds
2009-01-29 18:11                 ` Steven Rostedt
2009-01-29 18:23                 ` Peter Zijlstra
2009-01-29 18:31                   ` Steven Rostedt
2009-01-29 18:39                   ` Linus Torvalds
2009-01-29 18:44                     ` Peter Zijlstra
2009-01-30 11:23                       ` Jens Axboe
2009-01-30 12:32                         ` [PATCH -v3] " Peter Zijlstra
2009-01-30 12:38                           ` Jens Axboe
2009-01-30 12:48                             ` Peter Zijlstra
2009-01-30 12:55                               ` Jens Axboe
2009-01-30 12:56                                 ` Jens Axboe
2009-01-30 13:00                                   ` Peter Zijlstra
2009-01-30 13:02                           ` [PATCH -v4] " Peter Zijlstra
2009-01-30 14:51                             ` Ingo Molnar
2009-01-30 16:04                           ` [PATCH -v3] " Linus Torvalds
2009-01-30 16:16                             ` Peter Zijlstra
2009-01-31  8:44                               ` Jens Axboe
2009-01-29 18:49                 ` [PATCH -v2] " Ingo Molnar
2009-01-30  1:55                 ` Rusty Russell
2009-01-29 17:47             ` Peter Zijlstra
2009-01-29 17:55               ` Peter Zijlstra
2009-01-29 18:08                 ` Steven Rostedt
2009-01-30  1:11           ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090128131202.21757da6.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox