From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965134AbWFTHhm (ORCPT ); Tue, 20 Jun 2006 03:37:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965135AbWFTHhm (ORCPT ); Tue, 20 Jun 2006 03:37:42 -0400 Received: from smtp105.mail.mud.yahoo.com ([209.191.85.215]:59287 "HELO smtp105.mail.mud.yahoo.com") by vger.kernel.org with SMTP id S965134AbWFTHhm (ORCPT ); Tue, 20 Jun 2006 03:37:42 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:Message-ID:Date:From:User-Agent:X-Accept-Language:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=ESozIkQLlEROZPym2t6JzzZr0e2LSbEIp3anAOqOYULcxPuOexrupa6cULPGI3YnJaWiL/VJG7wGsO8cpb0Sxw07qUEfelR4jv3t2X3g4GF75deWxixmSGC3u+VGrf/Ib7+VXkZWpEIOoSNzHUDKXrNsDnKBCXQXsMPCHkU80uc= ; Message-ID: <4497A5BC.4070005@yahoo.com.au> Date: Tue, 20 Jun 2006 17:37:32 +1000 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007 Debian/1.7.12-1 X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: Dave Olson , mingo@elte.hu, ccb@acm.org, linux-kernel@vger.kernel.org Subject: Re: [patch] increase spinlock-debug looping timeouts (write_lock and NMI) References: <20060619233947.94f7e644.akpm@osdl.org> In-Reply-To: <20060619233947.94f7e644.akpm@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: > On Mon, 19 Jun 2006 22:35:46 -0700 (PDT) > Dave Olson wrote: > > >>| >>| I get that impression ;) If it takes 1-2 seconds to get this lock then it >>| can take five seconds. a) that's just gross and b) the NMI watchdog will >>| nuke the box. >>| >>| Why is it taking so long to get the lock? >>| >>| Does it happen in non-debug mode? >>| >>| What do we do about it? >> >>It seems possible that this might be the cause of problems we've had >>with our InfiniPath hardware/software, and also Mellanox/OpenIB hardware/software >>on some quad-socket/dual core opteron systems (8 cpu cores). >> >>We'll see very long delays when 8 MPI processes exit "simultaneously", and sometimes >>get NMI, sometimes system hangs, and sometimes just hung up for many seconds (and >>often in that state, doing sysrq-P or sysrq-T will make things happy again). >> > > > OK. I assume these processes have done a mmap(MAP_SHARED) of a lot of > memory? > > >>A typical trace looks like this (on an fc4 2.6.16 kernel): > > > fc4? You seem to have an RH-FCx which doesn't enable > CONFIG_DEBUG_SPINLOCK. Or maybe we didn't have all that debug code in > 2.6.16. Doesn't matter, really. > > >>[root@quad-00 ~]# NMI Watchdog detected LOCKUP on CPU 0 >>CPU 0 >>Modules linked in: nfs nfsd exportfs lockd nfs_acl ipv6 autofs4 sunrpc ib_sdp(U) >>ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipoib(U) ib_sa(U) ib_ipath(U) ib_mad(U) >>ib_core(U) video button battery ac i2c_nforce2 i2c_core ipath_core(U) e1000 >>floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sata_nv libata aic79xx >>scsi_transport_spi sd_mod scsi_mod >>Pid: 4239, comm: mpi_multibw Not tainted 2.6.16-1.2096_FC4.rootsmp #1 >>RIP: 0010:[] {_raw_write_lock+161} >>RSP: 0018:ffff810078e07c18 EFLAGS: 00000086 >>RAX: 000000008f100300 RBX: ffff81007b7bea58 RCX: 00000000002dc5a0 >>RDX: 0000000000927efd RSI: 0000000000000001 RDI: ffff81007b7bea58 >>RBP: ffff81007b7bea40 R08: ffff810002e3ae80 R09: 00000000fffffffa >>R10: 0000000000000003 R11: ffffffff801644e2 R12: ffff81007b7bea58 >>R13: 00002aaaad800000 R14: ffff810002e3aec0 R15: 00002aaabba6f000 >>FS: 0000000040a00960(0000) GS:ffffffff80514000(0000) knlGS:00000000f7fc86c0 >>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>CR2: 00000033f38bdaf0 CR3: 0000000000101000 CR4: 00000000000006e0 >>Process mpi_multibw (pid: 4239, threadinfo ffff810078e06000, task ffff810079d8a040) >>Stack: ffff810002e3aec0 ffffffff8016452b 0000000078ebb067 00002aaaad757000 >> ffff810078dccab8 ffffffff8016b840 0000000000000000 ffff810078e07d38 >> ffffffffffffffff 0000000000000000 >>Call Trace: {__set_page_dirty_nobuffers+73} >> {unmap_vmas+1042} {exit_mmap+124} >> {mmput+37} {do_exit+584} >> {__dequeue_signal+459} {sys_exit_group+0} >> {get_signal_to_deliver+1568} >>{do_signal+116} >> {__pollwait+0} {sys_select+934} >> {sysret_signal+28} >>{ptregscall_common+103} >> >>Code: 84 c0 75 7f f0 81 03 00 00 00 01 f3 90 48 83 c1 01 48 8b 15 >>Kernel panic - not syncing: nmi watchdog Any ideas what it might be waiting on? > > > blam, dead box, that's the one, thanks. > > With our current rwlock semantics I don't know if this is fixable. > Probably we need to go back to a spinlock on tree_lock. Lockless pagecache makes most of the readside locks go away, so I have converted tree_lock back to a spinlock in my tree. I've just started working on it again with a view for submitting it (or at least the RCU radix tree, to start with)... been having fun with a userspace RCU for rtth ;) Otherwise, a straight rwlock->spinlock conversion will have a few more scalability issues, but I'd guess it wouldn't be a problem at all for most workloads on most systems. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com