public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* "impossible" spinlock "wrong CPU" problem with custom device driver
@ 2009-07-08 22:48 Timm Korte
  0 siblings, 0 replies; only message in thread
From: Timm Korte @ 2009-07-08 22:48 UTC (permalink / raw)
  To: lkml

I'm trying to understand a spinlog bug in a kernel module (device driver).
I have a spinlock that is uses in the actual hardware interrupt handler
as well as in a seperate kernel thread doing the real work via a work
queue. The first one uses the spinlock with spin_lock() and
spin_unlock(), while the thread uses spin_lock_irqsave() and
spin_unlock_irqrestore().
On rare occasions (can't reproduce on purpose), i get a spinlog debug
message about wrong cpu on _raw_spin_unlock when called from the kernel
thread.

This is the source (for the kernel_thread) that runs into the problem:

static int my_irqthread_function(void *ptr) {
  struct my_dev *mydev = ptr;

  daemonize(MY_NAME "%02x", mydev->mynum);
  allow_signal(SIGTERM);
  while (!wait_event_interruptible(mydev->irqthread_wait,
atomic_read(&mydev->irqthread_pending_count))) {
    do {
      uint8_t my_irq_pending = 0;
      unsigned long iflags;

      spin_lock_irqsave(&mydev->irq_pending_lock, iflags);
      my_irq_pending = mydev->irq_pending;
      mydev->irq_pending = 0;
      spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);

      // handle irqs
      if (my_irq_pending & INT_IPAC1) {
         my_handle_interrupt(&mydev->mydev[IPAC1]);
      }
...
      // continue if the pending count still is != 0 after decrementing
    } while (!atomic_dec_and_test(&mydev->irqthread_pending_count));
  }

  mydev->irqthread = 0;
  complete_and_exit(&mydev->irqthread_exit, 0);
}

The error (SPIN_BUG with kernel panic on my SMP box) happens on the
"spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);" - but i
really can't figure out, how the thread could be moved to another cpu,
while holding the lock and only doing two assignment operations.

The only thing i could think of, is that it might have something to do
with the enabled sigterm signal - even though the module wasn't being
unloaded at the time the bug occured.

System is FC4 based with a 2.6.17 kernel (can't change).

So I'm sort of out of ideas and hope someone here has an idea, what
might have gone wrong here.

Timm

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2009-07-08 22:55 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-08 22:48 "impossible" spinlock "wrong CPU" problem with custom device driver Timm Korte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox