From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754347AbWKVMWy (ORCPT ); Wed, 22 Nov 2006 07:22:54 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755084AbWKVMWy (ORCPT ); Wed, 22 Nov 2006 07:22:54 -0500 Received: from port254.ds1-he.adsl.cybercity.dk ([217.157.198.197]:37220 "EHLO gere.msconsult.dk") by vger.kernel.org with ESMTP id S1754341AbWKVMWx convert rfc822-to-8bit (ORCPT ); Wed, 22 Nov 2006 07:22:53 -0500 From: rasmus@msconsult.dk (Rasmus =?utf-8?Q?B=C3=B8g?= Hansen) To: rasmus@msconsult.dk (Rasmus =?utf-8?Q?B=C3=B8g?= Hansen) Cc: Oleg Verych , LKML , Andrew Morton Subject: Re: smbfs (Re: BUG: soft lockup detected on CPU#0! (2.6.18.2)) References: <867ixyvum6.fsf@gere.msconsult.dk> <86odr6f55x.fsf@gere.msconsult.dk> Date: Wed, 22 Nov 2006 13:22:25 +0100 Message-ID: <86ac2jekn2.fsf@sif.msconsult.dk> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) Organization: MS Consult MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-MSC-MailScanner: Found to be clean X-MSC-MailScanner-From: rasmus@msconsult.dk Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org rasmus@msconsult.dk (Rasmus Bøg Hansen) writes: > Oleg Verych writes: > >> [ Adding e-mail of Andrew Morton, he may have clue about who to ping ;] >> [ MAINTAINERS.smbfs seems to be emply ] >> >> On 2006-11-14, Rasmus BЬg Hansen wrote: >> [] >>> [1.] One line summary of the problem: >>> >>> Kernel BUG's and freezes after a soft lockup. >>> >>> [2.] Full description of the problem/report: >>> >>> The night before sunday, my server froze. It was entirely dead and had >>> to be power cycled. There was no seriel console connected but it >>> managed to log a short BUG before, which seems related to smbfs. >>> >>> As it happened in the night, I am unsure what triggered the bug, but >>> it was during the nightly backup routines, which includes running >>> rsync over ssh (over ADSL so pretty slow) and writing some large >>> .tar.bz2 to a smbfs drive. I assume (but do no know for sure) that it >>> was the last one that triggered the bug. >> >> Nobody seems to picked this up. So. >> Why don't you try debian's kernel 2.6.18 from unstable? > > I haven't tried that, partially to get rid of the initrd and have > drivers for the controllers/disks in-kernel, partially because I like > having built my own kernel. That might be silly, of course - I can try > the Debian kernel, if you think it would make a difference. > >> I see, you've build it yourself, then try to enable some more locking >> debuging in the "kernel hacking" section. > > I am now running with all lock debugging turned on. "Unfortunately" > the machine has been running stable since the crash last weekend - > perhaps the extra-large backup rourines at sunday may trigger it > again... > >> (gitweb down, i can't check history of smbfs, and i have amd64 arch, anyway) >>> Nov 12 03:54:57 gere kernel: BUG: soft lockup detected on CPU#0! >>> Nov 12 03:54:57 gere kernel: [softlockup_tick+170/195] softlockup_tick+0xaa/0xc3 >>> Nov 12 03:54:57 gere kernel: [update_process_times+56/137] update_process_times+0x38/0x89 >>> Nov 12 03:54:57 gere kernel: [smp_apic_timer_interrupt+105/117] smp_apic_timer_interrupt+0x69/0x75 >>> Nov 12 03:54:57 gere kernel: [smbiod+238/348] smbiod+0xee/0x15c >> this >> >>> Nov 12 03:54:57 gere kernel: [apic_timer_interrupt+31/36] apic_timer_interrupt+0x1f/0x24 >>> Nov 12 03:54:57 gere kernel: [journal_init_revoke+49/678] journal_init_revoke+0x31/0x2a6 >>> Nov 12 03:54:57 gere kernel: [smbiod+238/348] smbiod+0xee/0x15c >> and this *may be* double (un)lock. > > Hopefully lock debugging will tell. I got this - I think it was this morning (somehow kernel logging was disabled so I can't tell the exact time): ---- ============================================= [ INFO: possible recursive locking detected ] --------------------------------------------- nfsd/1788 is trying to acquire lock: (&inode->i_mutex){--..}, at: [] mutex_lock+0x8/0xa but task is already holding lock: (&inode->i_mutex){--..}, at: [] mutex_lock+0x8/0xa other info that might help us debug this: 2 locks held by nfsd/1788: #0: (hash_sem){----}, at: [] exp_readlock+0x12/0x16 [nfsd] #1: (&inode->i_mutex){--..}, at: [] mutex_lock+0x8/0xa stack backtrace: [] show_trace+0x27/0x2b [] dump_stack+0x26/0x2a [] print_deadlock_bug+0xb5/0xba [] check_deadlock+0x61/0x71 [] __lock_acquire+0x334/0x9b6 [] lock_acquire+0x75/0x90 [] __mutex_lock_slowpath+0x85/0x270 [] mutex_lock+0x8/0xa [] nfsd_setattr+0x3ca/0x574 [nfsd] [] nfsd_create_v3+0x3a6/0x540 [nfsd] [] nfsd3_proc_create+0x118/0x161 [nfsd] [] nfsd_dispatch+0xd8/0x1ff [nfsd] [] svc_process+0x4e5/0x6da [sunrpc] [] nfsd+0x1cc/0x35e [nfsd] [] kernel_thread_helper+0x5/0xb ---- Also it doesn't seem to be related to smbfs AFAICS - I am not enough into lock debugging to say if this is relevant, useful or just noise... Regards /Rasmus -- Rasmus Bøg Hansen MSC Aps Bøgesvinget 8 2740 Skovlunde 44 53 93 66