From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759010AbZGHWz0 (ORCPT ); Wed, 8 Jul 2009 18:55:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757190AbZGHWzN (ORCPT ); Wed, 8 Jul 2009 18:55:13 -0400 Received: from mail.easycrypt.de ([84.200.20.67]:35887 "EHLO mail.easycrypt.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754439AbZGHWzN (ORCPT ); Wed, 8 Jul 2009 18:55:13 -0400 X-Greylist: delayed 399 seconds by postgrey-1.27 at vger.kernel.org; Wed, 08 Jul 2009 18:55:12 EDT Message-ID: <4A55222E.5030405@easycrypt.de> Date: Thu, 09 Jul 2009 00:48:14 +0200 From: Timm Korte User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.8.1.22) Gecko/20090605 Thunderbird/2.0.0.22 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: lkml Subject: "impossible" spinlock "wrong CPU" problem with custom device driver X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I'm trying to understand a spinlog bug in a kernel module (device driver). I have a spinlock that is uses in the actual hardware interrupt handler as well as in a seperate kernel thread doing the real work via a work queue. The first one uses the spinlock with spin_lock() and spin_unlock(), while the thread uses spin_lock_irqsave() and spin_unlock_irqrestore(). On rare occasions (can't reproduce on purpose), i get a spinlog debug message about wrong cpu on _raw_spin_unlock when called from the kernel thread. This is the source (for the kernel_thread) that runs into the problem: static int my_irqthread_function(void *ptr) { struct my_dev *mydev = ptr; daemonize(MY_NAME "%02x", mydev->mynum); allow_signal(SIGTERM); while (!wait_event_interruptible(mydev->irqthread_wait, atomic_read(&mydev->irqthread_pending_count))) { do { uint8_t my_irq_pending = 0; unsigned long iflags; spin_lock_irqsave(&mydev->irq_pending_lock, iflags); my_irq_pending = mydev->irq_pending; mydev->irq_pending = 0; spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags); // handle irqs if (my_irq_pending & INT_IPAC1) { my_handle_interrupt(&mydev->mydev[IPAC1]); } ... // continue if the pending count still is != 0 after decrementing } while (!atomic_dec_and_test(&mydev->irqthread_pending_count)); } mydev->irqthread = 0; complete_and_exit(&mydev->irqthread_exit, 0); } The error (SPIN_BUG with kernel panic on my SMP box) happens on the "spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);" - but i really can't figure out, how the thread could be moved to another cpu, while holding the lock and only doing two assignment operations. The only thing i could think of, is that it might have something to do with the enabled sigterm signal - even though the module wasn't being unloaded at the time the bug occured. System is FC4 based with a 2.6.17 kernel (can't change). So I'm sort of out of ideas and hope someone here has an idea, what might have gone wrong here. Timm