From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755618AbZHDOv7 (ORCPT ); Tue, 4 Aug 2009 10:51:59 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755538AbZHDOv6 (ORCPT ); Tue, 4 Aug 2009 10:51:58 -0400 Received: from casper.infradead.org ([85.118.1.10]:53863 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755594AbZHDOvz (ORCPT ); Tue, 4 Aug 2009 10:51:55 -0400 Subject: Re: Incorrect circular locking dependency? From: Peter Zijlstra To: David Howells Cc: Takashi Iwai , Ingo Molnar , Linux filesystem caching discussion list , LKML , Johannes Berg , oleg In-Reply-To: <24075.1248705430@redhat.com> References: <6950.1245661098@redhat.com> <24075.1248705430@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Tue, 04 Aug 2009 16:51:26 +0200 Message-Id: <1249397486.7924.243.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.26.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2009-07-27 at 15:37 +0100, David Howells wrote: > Takashi Iwai wrote: > > > ======================================================= > > [ INFO: possible circular locking dependency detected ] > > 2.6.30-test #7 > > ------------------------------------------------------- > > swapper/0 is trying to acquire lock: > > (&cwq->lock){-.-...}, at: [] __queue_work+0x1f/0x4e > > > > but task is already holding lock: > > (&q->lock){-.-.-.}, at: [] __wake_up+0x26/0x5c > > > > which lock already depends on the new lock. > > Okay. I think I understand this: > > (1) cachefiles_read_waiter() intercepts wake up events, and, as such, is run > inside the waitqueue spinlock for the page bit waitqueue. > > (2) cachefiles_read_waiter() calls fscache_enqueue_retrieval() which calls > fscache_enqueue_operation() which calls schedule_work() for fast > operations, thus taking a per-CPU workqueue spinlock. > > (3) queue_work(), which is called by many things, calls __queue_work(), which > takes the per-CPU workqueue spinlock. > > (4) __queue_work() then calls insert_work(), which calls wake_up(), which > takes the waitqueue spinlock for the per-CPU workqueue waitqueue. > > Even though the two waitqueues are separate, I think lockdep sees them as > having the same lock. Yeah, it looks like cwq->lock is always in the same lock class. Creating a new class for your second workqueue might help, we'd have to pass a second key through __create_workqueue_key() and pass that into init_cpu_workqueue() and apply it to cwq->lock using lockdep_set_class() and co. ~ Peter > --- > > the existing dependency chain (in reverse order) is: > > > > -> #1 (&q->lock){-.-.-.}: > > [] __lock_acquire+0xfd6/0x12d5 > > [] lock_acquire+0xb7/0xeb > > [] _spin_lock_irqsave+0x3d/0x5e > > [] __wake_up+0x26/0x5c > > [] insert_work+0x7b/0x95 > > [] __queue_work+0x2e/0x4e > > [] delayed_work_timer_fn+0x3c/0x4f > > [] run_timer_softirq+0x180/0x206 > > [] __do_softirq+0xc3/0x18d > > [] do_softirq+0x44/0x7a > > [] irq_exit+0x43/0x87 > > [] smp_apic_timer_interrupt+0x7c/0x9b > > [] apic_timer_interrupt+0x36/0x40 > > [] cpu_idle+0xa2/0xbe > > [] rest_init+0x66/0x79 > > [] start_kernel+0x396/0x3ae > > [] __init_begin+0x7f/0x98 > > [] 0xffffffff > > > > -> #0 (&cwq->lock){-.-...}: > > [] __lock_acquire+0xd26/0x12d5 > > [] lock_acquire+0xb7/0xeb > > [] _spin_lock_irqsave+0x3d/0x5e > > [] __queue_work+0x1f/0x4e > > [] queue_work_on+0x48/0x63 > > [] queue_work+0x23/0x38 > > [] schedule_work+0x1e/0x31 > > [] fscache_enqueue_operation+0xc5/0x102 [fscache] > > [] cachefiles_read_waiter+0xb3/0xcd [cachefiles] > > [] __wake_up_common+0x4c/0x85 > > [] __wake_up+0x38/0x5c > > [] __wake_up_bit+0x34/0x4b > > [] unlock_page+0x55/0x6a > > [] mpage_end_io_read+0x4e/0x71 > > [] bio_endio+0x31/0x44 > > [] req_bio_endio+0xab/0xde > > [] blk_update_request+0x17d/0x321 > > [] blk_update_bidi_request+0x22/0x62 > > [] blk_end_bidi_request+0x25/0x6e > > [] blk_end_request+0x1a/0x30 > > [] scsi_io_completion+0x193/0x3bb [scsi_mod] > > [] scsi_finish_command+0xd9/0xf2 [scsi_mod] > > [] scsi_softirq_done+0xf4/0x10d [scsi_mod] > > [] blk_done_softirq+0x6f/0x8e > > [] __do_softirq+0xc3/0x18d > > [] do_softirq+0x44/0x7a > > [] irq_exit+0x43/0x87 > > [] do_IRQ+0x8d/0xb2 > > [] common_interrupt+0x35/0x40 > > [] cpu_idle+0xa2/0xbe > > [] rest_init+0x66/0x79 > > [] start_kernel+0x396/0x3ae > > [] __init_begin+0x7f/0x98 > > [] 0xffffffff > > > > other info that might help us debug this: > > > > 1 lock held by swapper/0: > > #0: (&q->lock){-.-.-.}, at: [] __wake_up+0x26/0x5c > > > > stack backtrace: > > Pid: 0, comm: swapper Not tainted 2.6.30-test #7 > > Call Trace: > > [] ? printk+0x1d/0x33 > > [] print_circular_bug_tail+0xaf/0xcb > > [] __lock_acquire+0xd26/0x12d5 > > [] ? __queue_work+0x1f/0x4e > > [] lock_acquire+0xb7/0xeb > > [] ? __queue_work+0x1f/0x4e > > [] ? __queue_work+0x1f/0x4e > > [] _spin_lock_irqsave+0x3d/0x5e > > [] ? __queue_work+0x1f/0x4e > > [] __queue_work+0x1f/0x4e > > [] queue_work_on+0x48/0x63 > > [] queue_work+0x23/0x38 > > [] schedule_work+0x1e/0x31 > > [] fscache_enqueue_operation+0xc5/0x102 [fscache] > > [] cachefiles_read_waiter+0xb3/0xcd [cachefiles] > > [] __wake_up_common+0x4c/0x85 > > [] __wake_up+0x38/0x5c > > [] __wake_up_bit+0x34/0x4b > > [] unlock_page+0x55/0x6a > > [] mpage_end_io_read+0x4e/0x71 > > [] bio_endio+0x31/0x44 > > [] req_bio_endio+0xab/0xde > > [] blk_update_request+0x17d/0x321 > > [] blk_update_bidi_request+0x22/0x62 > > [] blk_end_bidi_request+0x25/0x6e > > [] blk_end_request+0x1a/0x30 > > [] scsi_io_completion+0x193/0x3bb [scsi_mod] > > [] ? trace_hardirqs_on+0x19/0x2c > > [] ? scsi_device_unbusy+0x92/0xaa [scsi_mod] > > [] scsi_finish_command+0xd9/0xf2 [scsi_mod] > > [] scsi_softirq_done+0xf4/0x10d [scsi_mod] > > [] blk_done_softirq+0x6f/0x8e > > [] __do_softirq+0xc3/0x18d > > [] do_softirq+0x44/0x7a > > [] irq_exit+0x43/0x87 > > [] do_IRQ+0x8d/0xb2 > > [] common_interrupt+0x35/0x40 > > [] ? mwait_idle+0x98/0xec > > [] cpu_idle+0xa2/0xbe > > [] rest_init+0x66/0x79 > > [] start_kernel+0x396/0x3ae > > [] __init_begin+0x7f/0x98