From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262814AbVAKQJm (ORCPT ); Tue, 11 Jan 2005 11:09:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262813AbVAKQJm (ORCPT ); Tue, 11 Jan 2005 11:09:42 -0500 Received: from penguin.cohaesio.net ([212.97.129.34]:55739 "EHLO mail.cohaesio.net") by vger.kernel.org with ESMTP id S262816AbVAKQJO convert rfc822-to-8bit (ORCPT ); Tue, 11 Jan 2005 11:09:14 -0500 From: Anders Saaby Organization: Cohaesio A/S To: linux-kernel@vger.kernel.org Subject: Re: panic - Attempting to free lock with active block list Date: Tue, 11 Jan 2005 17:09:42 +0100 User-Agent: KMail/1.7.2 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Content-Disposition: inline Message-Id: <200501111709.42934.as@cohaesio.com> X-OriginalArrivalTime: 11 Jan 2005 16:09:11.0596 (UTC) FILETIME=[E3515EC0:01C4F7F7] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi Myklebust(s) :) I have seen the exact same error on one of my webservers which is serving from an NFS export and under heavy load. ~2 hours uptime before panic'ing. I then tried Trond's patch which seems to work. 14 hours of uptime now. :) Anyways, I have a couple of issues you might be able to clear up for me: First issue: New strange message in the kernel log: "nlmclnt_lock: VFS is out of sync with lock manager!" - What does this mean? - Is it bad?, What can i do? Second issue: my fs/nfs/file.c doesn't look like yours (Vanilla 2.6.10):         status = NFS_PROTO(inode)->lock(filp, cmd, fl);         /* If we were signalled we still need to ensure that          * we clean up any state on the server. We therefore          * record the lock call as having succeeded in order to          * ensure that locks_remove_posix() cleans it out when          * the process exits.          */         if (status == -EINTR || status == -ERESTARTSYS)                 posix_lock_file_wait(filp, fl);         unlock_kernel();         if (status < 0)                 return status;         /*          * Make sure we clear the cache whenever we try to get the lock.          * This makes locking act as a cache coherency point.          */         filemap_fdatawrite(filp->f_mapping);         down(&inode->i_sem);         nfs_wb_all(inode);      /* we may have slept */         up(&inode->i_sem);         filemap_fdatawait(filp->f_mapping);         nfs_zap_caches(inode);         return 0; So... Am I missing another patch or something else? Jan-Frode Myklebust wrote: > On Wed, Jan 05, 2005 at 10:54:03PM +0100, Trond Myklebust wrote: >> >> Looking at the NFS code, I can attempt a wild guess about what may be >> happening: there may be a race when pressing ^C in the middle of a >> blocking NFS lock RPC call, and if so, the following patch will fix it. > > > A whopping 9 hours of uptime now :) So the one-liner patch seems to have > fixed it. > > Thanks! > >> - posix_lock_file(filp, fl); >> + posix_lock_file_wait(filp, fl); > > > -jf -- Med venlig hilsen - Best regards - Meilleures salutations Anders Saaby Systems Engineer ------------------------------------------------ Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby Phone: +45 45 880 888 - Fax: +45 45 880 777 Mail: as@cohaesio.com - http://www.cohaesio.com ------------------------------------------------