public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Deadlock in ia64_mca_cmc_int_caller
@ 2003-12-06  4:16 Keith Owens
  2003-12-06 15:23 ` Alex Williamson
  2003-12-06 22:50 ` Keith Owens
  0 siblings, 2 replies; 3+ messages in thread
From: Keith Owens @ 2003-12-06  4:16 UTC (permalink / raw)
  To: linux-ia64

ia64_mca_cmc_int_caller() calls smp_call_function() which waits until
all cpus have taken the IPI before returning.  This interacts badly
with locks that are sometimes taken with interrupts disabled and
sometimes with interrupts enabled, smp_call_function can deadlock.

cpu 3                                                   cpu 0
Holds tasklist_lock with interrupts enabled,
it did read_lock() or write_lock().

                                                Does read_lock_irq() or
                                                write_lock_irq().  Spinning
                                                disabled waiting for tasklist_lock.

CMC interrupt occurs

ia64_mca_cmc_int_caller() calls smp_call_function()

smp_call_function() sends IPI to other cpus

                                                IPI on cpu 0 blocked, it is disabled
						waiting for tasklist_lock.

smp_call_function() waits until IPI reaches
all other cpus.

cpu 0 never responds, we never release the
tasklist lock, deadlock.

AFAICT it is never safe to call smp_call_function() from an interrupt
handler.

The unsafe nature of smp_call_function is not ia64 specific.  ix86 can
deadlock this way if any ix86 code calls smp_call_function from an
interrupt handler.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Deadlock in ia64_mca_cmc_int_caller
  2003-12-06  4:16 Deadlock in ia64_mca_cmc_int_caller Keith Owens
@ 2003-12-06 15:23 ` Alex Williamson
  2003-12-06 22:50 ` Keith Owens
  1 sibling, 0 replies; 3+ messages in thread
From: Alex Williamson @ 2003-12-06 15:23 UTC (permalink / raw)
  To: linux-ia64

Keith,

   We debugged a similar problem with the old CMC/CPE code recently. 
However, the latest version in 2.4/2.6 fixed that problem.  So are you
actually hitting a deadlock when ia64_mca_cmc_int_caller() calls
smp_call_function(ia64_mca_cmc_vector_enable, NULL, 1, 0)?  I've reached
the same conclusion about smp_call_function, my mistake for using it in
the first place, it's way too dangerous.  We need to enable/disable the
CMC vector in a better context or use another mechanism.

	Alex

On Fri, 2003-12-05 at 21:16, Keith Owens wrote:
> ia64_mca_cmc_int_caller() calls smp_call_function() which waits until
> all cpus have taken the IPI before returning.  This interacts badly
> with locks that are sometimes taken with interrupts disabled and
> sometimes with interrupts enabled, smp_call_function can deadlock.
> 
> cpu 3                                                   cpu 0
> Holds tasklist_lock with interrupts enabled,
> it did read_lock() or write_lock().
> 
>                                                 Does read_lock_irq() or
>                                                 write_lock_irq().  Spinning
>                                                 disabled waiting for tasklist_lock.
> 
> CMC interrupt occurs
> 
> ia64_mca_cmc_int_caller() calls smp_call_function()
> 
> smp_call_function() sends IPI to other cpus
> 
>                                                 IPI on cpu 0 blocked, it is disabled
> 						waiting for tasklist_lock.
> 
> smp_call_function() waits until IPI reaches
> all other cpus.
> 
> cpu 0 never responds, we never release the
> tasklist lock, deadlock.
> 
> AFAICT it is never safe to call smp_call_function() from an interrupt
> handler.
> 
> The unsafe nature of smp_call_function is not ia64 specific.  ix86 can
> deadlock this way if any ix86 code calls smp_call_function from an
> interrupt handler.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Deadlock in ia64_mca_cmc_int_caller
  2003-12-06  4:16 Deadlock in ia64_mca_cmc_int_caller Keith Owens
  2003-12-06 15:23 ` Alex Williamson
@ 2003-12-06 22:50 ` Keith Owens
  1 sibling, 0 replies; 3+ messages in thread
From: Keith Owens @ 2003-12-06 22:50 UTC (permalink / raw)
  To: linux-ia64

On Sat, 06 Dec 2003 08:23:50 -0700, 
Alex Williamson <alex.williamson@hp.com> wrote:
>   We debugged a similar problem with the old CMC/CPE code recently. 
>However, the latest version in 2.4/2.6 fixed that problem.  So are you
>actually hitting a deadlock when ia64_mca_cmc_int_caller() calls
>smp_call_function(ia64_mca_cmc_vector_enable, NULL, 1, 0)?

Yes, at the point that smp_call_function is spinning on

  while (atomic_read(&data.started) != cpus)

The cpus that were not responding were spinning disabled waiting for
tasklist_lock.  The assumption is that tasklist_lock is held by the
current cpu.

>I've reached
>the same conclusion about smp_call_function, my mistake for using it in
>the first place, it's way too dangerous.

Using smp_call_function in any interrupt context is unsafe, we should
add a badness check to smp_call_function for that state.  I think that
bh context is bad as well, but need to confirm that.  Of course it is
not interrupt/bh context per se that is bad, but the interaction of
those contexts with spinlocks that are sometimes taken enabled and
sometimes disabled and synchronizing across cpus.

>We need to enable/disable the
>CMC vector in a better context or use another mechanism.

Since the only safe time to use smp_call_function is with no spinlocks
held on the current cpu, that restricts us to a user context thread.
Create a kernel thread called smp_call_nowait that waits on a semaphore
which CMC/CPE does up() on.  Use a list of kmalloc(GFP_ATOMIC)
structures containing

  list_head
  void (*func) (void *info)
  void *info
  char info_data[variable]

When smp_call_nowait wakes up, it takes the first entry off the list,
calls smp_call_function with wait=1 then kfrees the list entry.  The
'_nowait' part of the thread name indicates that the original caller
does not wait for the smp function to take effect.

I will code this up on Monday.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-12-06 22:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-06  4:16 Deadlock in ia64_mca_cmc_int_caller Keith Owens
2003-12-06 15:23 ` Alex Williamson
2003-12-06 22:50 ` Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox