From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Thu, 12 Nov 2009 11:14:57 -0600 Subject: [Cluster-devel] Re: [PATCH 2/2] dlm: Add down/up_write_non_owner to keep lockdep happy In-Reply-To: <1258036158.6052.874.camel@localhost.localdomain> References: <1255445776-3112-1-git-send-email-swhiteho@redhat.com> <1255445776-3112-2-git-send-email-swhiteho@redhat.com> <1255445776-3112-3-git-send-email-swhiteho@redhat.com> <20091112142211.GA468@elte.hu> <1258036158.6052.874.camel@localhost.localdomain> Message-ID: <20091112171457.GD20714@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Thu, Nov 12, 2009 at 02:29:18PM +0000, Steven Whitehouse wrote: > Hi, > > On Thu, 2009-11-12 at 15:22 +0100, Ingo Molnar wrote: > > * Steven Whitehouse wrote: > > > > > I looked at possibly changing this to use completions, but > > > it seems that the usage here is not easily adapted to that. > > > This patch adds suitable annotation to the write side of > > > the ls_in_recovery semaphore so that we don't get nasty > > > messages from lockdep when mounting a gfs2 filesystem. > > > > What do those 'nasty messages' say? If they expose some bug and this > > patch works around that bug by hiding it then NAK ... > > > > Ingo > > > The nasty messages are moaning that the lock is being taken in one > thread and unlocked in another. I couldn't see any bugs in the code when > I looked at it. Below are the messages that I get - to reproduce just > mount a GFS2 filesystem with the dlm lock manager. It happens on every > mount, > > Steve. > > Nov 12 15:10:01 chywoon kernel: > ============================================= > Nov 12 15:10:01 chywoon kernel: [ INFO: possible recursive locking > detected ] That recursive locking trace is something different. up_write_non_owner() addresses this trace, which as you say, is from doing the down and up from different threads (which is the intention): Nov 11 16:50:08 bull-02 kernel: GFS2: fsid=bull:foo.1: Joined cluster. Now mounting FS... Nov 11 16:50:09 bull-02 kernel: Nov 11 16:50:09 bull-02 kernel: ===================================== Nov 11 16:50:09 bull-02 kernel: [ BUG: bad unlock balance detected! ] Nov 11 16:50:09 bull-02 kernel: ------------------------------------- Nov 11 16:50:09 bull-02 kernel: dlm_recoverd/8958 is trying to release lock (&ls->ls_in_recovery) at: Nov 11 16:50:09 bull-02 kernel: [] dlm_recoverd+0x323/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: but there are no more locks to release! Nov 11 16:50:09 bull-02 kernel: Nov 11 16:50:09 bull-02 kernel: other info that might help us debug this: Nov 11 16:50:09 bull-02 kernel: 3 locks held by dlm_recoverd/8958: Nov 11 16:50:09 bull-02 kernel: #0: (&ls->ls_recoverd_active){......}, at: [] dlm_recoverd+0xf0/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: #1: (&ls->ls_recv_active){......}, at: [] dlm_recoverd+0x2fd/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: #2: (&ls->ls_recover_lock){......}, at: [] dlm_recoverd+0x305/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: Nov 11 16:50:09 bull-02 kernel: stack backtrace: Nov 11 16:50:09 bull-02 kernel: Pid: 8958, comm: dlm_recoverd Not tainted 2.6.32-rc5 #2 Nov 11 16:50:09 bull-02 kernel: Call Trace: Nov 11 16:50:09 bull-02 kernel: [] ? dlm_recoverd+0x323/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: [] print_unlock_inbalance_bug+0xd6/0xe0 Nov 11 16:50:09 bull-02 kernel: [] lock_release_non_nested+0xbe/0x259 Nov 11 16:50:09 bull-02 kernel: [] ? dlm_recoverd+0x323/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: [] ? dlm_recoverd+0x323/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: [] lock_release+0x14a/0x16c Nov 11 16:50:09 bull-02 kernel: [] up_write+0x1e/0x2d Nov 11 16:50:09 bull-02 kernel: [] dlm_recoverd+0x323/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: [] ? dlm_recoverd+0x0/0x4ce [dlm] Nov 11 16:50:09 bull-02 kernel: [] kthread+0x7d/0x85 Nov 11 16:50:09 bull-02 kernel: [] child_rip+0xa/0x20 Nov 11 16:50:09 bull-02 kernel: [] ? kthreadd+0xc2/0xe3 Nov 11 16:50:09 bull-02 kernel: [] ? kthread+0x0/0x85 Nov 11 16:50:09 bull-02 kernel: [] ? child_rip+0x0/0x20