From: Andrew Morton <akpm@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
mingo@elte.hu, tglx@linutronix.de, peterz@infradead.org,
arjan@infradead.org, rusty@rustcorp.com.au,
Jens Axboe <jens.axboe@oracle.com>
Subject: Re: Buggy IPI and MTRR code on low memory
Date: Wed, 28 Jan 2009 13:12:02 -0800 [thread overview]
Message-ID: <20090128131202.21757da6.akpm@linux-foundation.org> (raw)
In-Reply-To: <alpine.DEB.1.10.0901281029150.25359@gandalf.stny.rr.com>
On Wed, 28 Jan 2009 11:38:14 -0500 (EST)
Steven Rostedt <rostedt@goodmis.org> wrote:
>
> While developing the RT git tree I came across this deadlock.
>
> To avoid touching the memory allocator in smp_call_function_many I forced
> the stack use case, the path that would be taken if data fails to
> allocate.
>
> Here's the current code in kernel/smp.c:
>
> void smp_call_function_many(const struct cpumask *mask,
> void (*func)(void *), void *info,
> bool wait)
> {
> struct call_function_data *data;
> [...]
> data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC);
> if (unlikely(!data)) {
> /* Slow path. */
> for_each_online_cpu(cpu) {
> if (cpu == smp_processor_id())
> continue;
> if (cpumask_test_cpu(cpu, mask))
> smp_call_function_single(cpu, func, info,
> wait);
> }
> return;
> }
> [...]
>
> int smp_call_function_single(int cpu, void (*func) (void *info), void
> *info,
> int wait)
> {
> struct call_single_data d;
> [...]
> if (!wait) {
> data = kmalloc(sizeof(*data), GFP_ATOMIC);
> if (data)
> data->flags = CSD_FLAG_ALLOC;
> }
> if (!data) {
> data = &d;
> data->flags = CSD_FLAG_WAIT;
> }
>
> Note that if data failed to allocate, we force the wait state.
>
>
> This immediately caused a deadlock with the mtrr code:
>
> arch/x86/kernel/cpu/mtrr/main.c:
>
> static void set_mtrr(unsigned int reg, unsigned long base,
> unsigned long size, mtrr_type type)
> {
> struct set_mtrr_data data;
> [...]
> /* Start the ball rolling on other CPUs */
> if (smp_call_function(ipi_handler, &data, 0) != 0)
> panic("mtrr: timed out waiting for other CPUs\n");
>
> local_irq_save(flags);
>
> while(atomic_read(&data.count))
> cpu_relax();
>
> /* ok, reset count and toggle gate */
> atomic_set(&data.count, num_booting_cpus() - 1);
> smp_wmb();
> atomic_set(&data.gate,1);
>
> [...]
>
> static void ipi_handler(void *info)
> /* [SUMMARY] Synchronisation handler. Executed by "other" CPUs.
> [RETURNS] Nothing.
> */
> {
> #ifdef CONFIG_SMP
> struct set_mtrr_data *data = info;
> unsigned long flags;
>
> local_irq_save(flags);
>
> atomic_dec(&data->count);
> while(!atomic_read(&data->gate))
> cpu_relax();
>
>
> The problem is that if we use the stack, then we must wait for the
> function to finish. But in the mtrr code, the called functions are waiting
> for the caller to do something after the smp_call_function. Thus we
> deadlock! This mtrr code seems to have been there for a while. At least
> longer than the git history.
My initial reaction is that the mtrr code is being stupid, but I guess
that strengthening the smp_call_function() stuff is good, and we _do_
have this "wait=0" contract.
> To get around this, I did the following hack. Now this may be good
> enough to handle the case. I'm posting it for comments.
>
> The patch creates another flag called CSD_FLAG_RELEASE. If we fail
> to alloc the data and the wait bit is not set, we still use the stack
> but we also set this flag instead of the wait flag. The receiving IPI
> will copy the data locally, and if this flag is set, it will clear it. The
> caller, after sending the IPI, will wait on this flag to be cleared.
>
> The difference between this and the wait bit is that the release bit is
> just a way to let the callee tell the caller that it copied the data and
> is continuing. The data can be released with no worries. This prevents the
> deadlock because the caller can continue without waiting for the functions
> to be called.
>
> I tested this patch by forcing the data to be null:
>
> data = NULL; // kmalloc(...);
>
> Also, when forcing data to be NULL on the latest git tree, without
> applying the patch, I hit a deadlock in testing of the NMI watchdog. This
> means there may be other areas in the kernel that think smp_call_function,
> without the wait bit set, expects that function not to ever wait.
Concern 1: do all architectures actually call
generic_smp_call_function_single_interrupt()? I don't think they
_have_ to at present, and if they don't, we now have inconsistent
behaviour between architectures.
Concern 2: not all architectures set CONFIG_USE_GENERIC_SMP_HELPERS=y.
Those which do not set CONFIG_USE_GENERIC_SMP_HELPERS might need to
have similar changes made so that the behaviour remains consistent
across architectures.
Thought: do we need to do the kmalloc at all? Perhaps we can instead
use a statically allocated per-cpu call_single_data local to
kernel/smp.c? It would need a spinlock or something to protect it...
next prev parent reply other threads:[~2009-01-28 21:13 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-28 16:38 Buggy IPI and MTRR code on low memory Steven Rostedt
2009-01-28 16:41 ` Steven Rostedt
2009-01-28 16:46 ` Peter Zijlstra
2009-01-28 16:56 ` Steven Rostedt
2009-01-28 17:00 ` Peter Zijlstra
2009-01-28 17:24 ` Steven Rostedt
2009-01-28 18:20 ` Peter Zijlstra
2009-01-28 18:52 ` Steven Rostedt
2009-01-28 18:22 ` Arjan van de Ven
2009-01-28 18:34 ` Steven Rostedt
2009-01-28 21:12 ` Andrew Morton [this message]
2009-01-28 21:13 ` Andrew Morton
2009-01-28 21:23 ` Steven Rostedt
2009-01-28 22:07 ` Andrew Morton
2009-01-28 22:47 ` Steven Rostedt
2009-01-28 23:20 ` Andrew Morton
2009-01-28 23:50 ` Steven Rostedt
2009-01-28 23:25 ` Rusty Russell
2009-01-28 23:41 ` Steven Rostedt
2009-01-29 0:52 ` [PATCH] use per cpu data for single cpu ipi calls Steven Rostedt
2009-01-29 1:30 ` Andrew Morton
2009-01-29 1:56 ` Steven Rostedt
2009-01-29 8:49 ` Peter Zijlstra
2009-01-29 11:13 ` Ingo Molnar
2009-01-29 11:41 ` Peter Zijlstra
2009-01-29 13:42 ` Ingo Molnar
2009-01-29 14:07 ` Steven Rostedt
2009-01-29 15:08 ` [PATCH -v2] " Steven Rostedt
2009-01-29 15:33 ` Peter Zijlstra
2009-01-29 16:17 ` Ingo Molnar
2009-01-29 17:21 ` Linus Torvalds
2009-01-29 17:44 ` Steven Rostedt
2009-01-29 17:50 ` Steven Rostedt
2009-01-29 18:08 ` Linus Torvalds
2009-01-29 18:11 ` Steven Rostedt
2009-01-29 18:23 ` Peter Zijlstra
2009-01-29 18:31 ` Steven Rostedt
2009-01-29 18:39 ` Linus Torvalds
2009-01-29 18:44 ` Peter Zijlstra
2009-01-30 11:23 ` Jens Axboe
2009-01-30 12:32 ` [PATCH -v3] " Peter Zijlstra
2009-01-30 12:38 ` Jens Axboe
2009-01-30 12:48 ` Peter Zijlstra
2009-01-30 12:55 ` Jens Axboe
2009-01-30 12:56 ` Jens Axboe
2009-01-30 13:00 ` Peter Zijlstra
2009-01-30 13:02 ` [PATCH -v4] " Peter Zijlstra
2009-01-30 14:51 ` Ingo Molnar
2009-01-30 16:04 ` [PATCH -v3] " Linus Torvalds
2009-01-30 16:16 ` Peter Zijlstra
2009-01-31 8:44 ` Jens Axboe
2009-01-29 18:49 ` [PATCH -v2] " Ingo Molnar
2009-01-30 1:55 ` Rusty Russell
2009-01-29 17:47 ` Peter Zijlstra
2009-01-29 17:55 ` Peter Zijlstra
2009-01-29 18:08 ` Steven Rostedt
2009-01-30 1:11 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090128131202.21757da6.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.