public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Race in salinfo.c?
@ 2003-10-13  8:55 Keith Owens
  2003-10-13 15:24 ` Bjorn Helgaas
  2003-10-16  6:58 ` Keith Owens
  0 siblings, 2 replies; 3+ messages in thread
From: Keith Owens @ 2003-10-13  8:55 UTC (permalink / raw)
  To: linux-ia64

I need a sanity check on the existing salinfo.c code, it looks like a
race.

salinfo_log_wakeup()

  set_bit(smp_processor_id(), &event->cpu_mask);
  wake_up_interruptible(&event->queue);

salinfo_event_read()

  if (!event->cpu_mask) {
    ...
    interruptible_sleep_on(&event->queue);



      cpu 0                                   cpu 1
salinfo_log_wakeup                      salinfo_event_read
                                        event->cpu_mask = 0
set_bit()
wake_up_interruptible(), nothing on
the queue
                                        interruptible_sleep_on()

cpu 1 missed the event.  It will only be picked up if another salinfo
event occurs and calls wake_up_interruptible() a second time.

If that race exists (I cannot see anything that prevents it), the only
solution is to grab BKL around accesses of event->cpu_mask.  It has to
be BKL, we have no atomic operation that does 

  drop_lock();
  interruptible_sleep_on();
  acquire_lock();

I am already doing other work on salinfo.c.  If this race is real then
I will do a patch to close it.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Race in salinfo.c?
  2003-10-13  8:55 Race in salinfo.c? Keith Owens
@ 2003-10-13 15:24 ` Bjorn Helgaas
  2003-10-16  6:58 ` Keith Owens
  1 sibling, 0 replies; 3+ messages in thread
From: Bjorn Helgaas @ 2003-10-13 15:24 UTC (permalink / raw)
  To: linux-ia64

On Monday 13 October 2003 2:55 am, Keith Owens wrote:
> I need a sanity check on the existing salinfo.c code, it looks like a
> race.
> 
> salinfo_log_wakeup()
> 
>   set_bit(smp_processor_id(), &event->cpu_mask);
>   wake_up_interruptible(&event->queue);
> 
> salinfo_event_read()
> 
>   if (!event->cpu_mask) {
>     ...
>     interruptible_sleep_on(&event->queue);
> 
> 
> 
>       cpu 0                                   cpu 1
> salinfo_log_wakeup                      salinfo_event_read
>                                         event->cpu_mask = 0
> set_bit()
> wake_up_interruptible(), nothing on
> the queue
>                                         interruptible_sleep_on()
> 
> cpu 1 missed the event.  It will only be picked up if another salinfo
> event occurs and calls wake_up_interruptible() a second time.

I think you're right.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Race in salinfo.c?
  2003-10-13  8:55 Race in salinfo.c? Keith Owens
  2003-10-13 15:24 ` Bjorn Helgaas
@ 2003-10-16  6:58 ` Keith Owens
  1 sibling, 0 replies; 3+ messages in thread
From: Keith Owens @ 2003-10-16  6:58 UTC (permalink / raw)
  To: linux-ia64

On Mon, 13 Oct 2003 09:24:42 -0600, 
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>On Monday 13 October 2003 2:55 am, Keith Owens wrote:
>> I need a sanity check on the existing salinfo.c code, it looks like a
>> race.
>> 
>> salinfo_log_wakeup()
>> 
>>   set_bit(smp_processor_id(), &event->cpu_mask);
>>   wake_up_interruptible(&event->queue);
>> 
>> salinfo_event_read()
>> 
>>   if (!event->cpu_mask) {
>>     ...
>>     interruptible_sleep_on(&event->queue);

>I think you're right.

More salinfo problems.  2.4.23-pre6 cset-1.1069.1.78-to-1.1103 has 
salinfo_save_record which does spin_lock_irqsave(&data_lock).
salinfo_log_open() only does spin_lock(&data_lock).  This sequence will
deadlock

sys_open()->salinfo_log_open()
  salinfo_log_open()->spin_lock(&data_lock)
  cpe or cmc interrupt->salinfo_log_wakeup()
    salinfo_log_wakeup()->salinfo_save_record()
      salinfo_save_record()->spin_lock_irqsave(&data_lock)
      Deadlock.

Besides that deadlock (which can be fixed by always using
spin_lock_irqsave on data_lock), there is still the race I described
above.  Making {set bit, wakeup} atomic wrt. {test bits, sleep}
requires BKL.  Taking BKL in salinfo_log_wakeup() can deadlock when
salinfo_log_wakeup() is called from an interrupt.

cmc and cpe records are processed in interrupt context but with
interrupts enabled.  So a cmc event can interrupt cpe and vice versa.

Most of the problem is caused by the different contexts that the
salinfo code is called from.  Sometimes it is in interrupt, sometimes
it is not.  salinfo_log_wakeup() is called in both user and interrupt
context, which affects all the downstream routines.

What about changing all salinfo.c code to always run in user context,
with interrupts enabled?  That removes all the deadlocks and lets us
use BKL to fix the race above.  It also lets us use vmalloc instead of
kmalloc, every little helps.  smp_call_function_single() gets replaced
with set_cpus_allowed() so reading the record from another cpu runs in
user, not interrupt, context.

The interface between mca.c and salinfo.c would become a tasklet.
mca.c calls salinfo_log_wakeup_schedule() which saves the buffer
address and size and schedules salinfo_log_wakeup().  The tasklet
scheduler ensures that the salinfo code only runs one instance at a
time.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-10-16  6:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-13  8:55 Race in salinfo.c? Keith Owens
2003-10-13 15:24 ` Bjorn Helgaas
2003-10-16  6:58 ` Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox