From: Prarit Bhargava <prarit@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, jbeulich@novell.com,
anil.s.keshavamurthy@intel.com, amul.shah@unisys.com
Subject: Re: [PATCH]: Use stop_machine_run in the Intel RNG driver
Date: Thu, 01 Mar 2007 07:11:30 -0500 [thread overview]
Message-ID: <45E6C2F2.8090007@redhat.com> (raw)
In-Reply-To: <20070228221619.dc135f3c.akpm@linux-foundation.org>
> I think what you're describing here is just the standard old
> smp_call_function() deadlock, rather than anything which is specific to
> intel-rng, yes?
>
> It is "well known" that you can't call smp_call_function() with local
> interrupts disabled. In fact i386 will spit a warning if you try it.
>
>
> intel-rng doesn't do that, but what it _does_ do is:
>
> smp_call_function(..., wait = 0);
> local_irq_disable();
>
> so some CPUs will still be entering the IPI while this CPU has gone and
> disabled interrupts, thus exposing us to the deadlock, yes?
>
Not quite Andrew. This was a different and little more complicated.
The deadlock occurs because two CPUs are in contention over a rw_lock
and the "call_function" puts the CPUs in such a state that no forward
progress will be made until the calling CPU has completed it's code.
Here's a more detailed example (sorry for the cut-and-paste):
1. CPU A has done read_lock(&lock), and has acquired the lock.
2. CPU B has done write_lock_irq(&lock) and is waiting for A to release
the lock. CPU B has disabled interrupts while waiting for the interrupt:
void __lockfunc _write_lock_irq(rwlock_t *lock)
{
local_irq_disable();
preempt_disable();
rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
_raw_write_lock(lock);
}
3. CPU C issues smp_call_function, as in the case of the intel-rng driver:
set_mb(waitflag, 1);
smp_call_function(intel_init_wait, NULL, 1, 0);
...
// do some stuff with interrupts disabled
...
set_mb(waitflag, 0);
where
static char __initdata waitflag;
static void __init intel_init_wait(void *unused)
{
while (waitflag)
cpu_relax();
}
In this code the calling processor, C, has issued an IPI and disabled
interrupts on every processor except itself. When each processor takes
the IPI it runs intel_init_wait and waits in a tight loop until
waitflag is zero. ie) no forward progress on any CPU.
CPU C will not execute the code below the smp_call_function until all
processors have started (not completed!) the IPI function. From
call_smp_function:
cpus = num_online_cpus() - 1;
...
/* Send a message to all other CPUs and wait for them to respond */
send_IPI_allbutself(CALL_FUNCTION_VECTOR);
/* Wait for response */
while (atomic_read(&data.started) != cpus)
cpu_relax();
So CPU C is waiting here.
4. CPU A, which holds the lock sees the IPI and is in the
intel_init_wait code, happily waiting. CPU A has incremented
data.started. CPU A will stay in this loop until CPU C sets waitflag = 0.
5. CPU B, if you recall is _waiting with interrupts disabled_ for CPU A
to release the lock. It does not see the IPI because it has interrupts
disabled. It will not see the IPI until CPU A has released the lock.
6. CPU C is eventually only waiting for CPU B to do the final increment
of data.started = cpus. CPU B is waiting for CPU A to release the
lock. CPU A is executing a tight loop which it will not exit from until
CPU C can set waitflag to zero.
That's a 3-way deadlock.
So, the issue is placing the other CPUs in a state that they do not make
forward progress. The deadlock occurs before the calling CPU has
disabled interrupts in the code in step 3.
I also tested this code without the __init tags and explicitly coding
waitflag=0 to avoid the gcc only setting .bss section variables to zero
error that someone fixed last week. I also removed the code that
disabled interrupts on the calling processor which had no effect -- at
first I thought it was a simple interrupt issue ...
Maybe smp_call_function needs a written warning that the called function
should not "suspend" CPUs?
P.
prev parent reply other threads:[~2007-03-01 12:11 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-27 12:08 [PATCH]: Use stop_machine_run in the Intel RNG driver Prarit Bhargava
2007-02-27 12:22 ` Prarit Bhargava
2007-03-01 6:16 ` Andrew Morton
2007-03-01 12:11 ` Prarit Bhargava [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45E6C2F2.8090007@redhat.com \
--to=prarit@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=amul.shah@unisys.com \
--cc=anil.s.keshavamurthy@intel.com \
--cc=jbeulich@novell.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.