From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fabio Massimo Di Nitto Date: Thu, 01 Nov 2007 03:30:01 -0400 Subject: [Cluster-devel] [GFS2] Git tree update In-Reply-To: <1193844858.1068.516.camel@quoit> References: <1193646556.1068.459.camel@quoit> <4725B510.7060902@ubuntu.com> <1193844858.1068.516.camel@quoit> Message-ID: <47298079.9010209@ubuntu.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Steven Whitehouse wrote: > Hi, > > On Mon, 2007-10-29 at 11:25 +0100, Fabio Massimo Di Nitto wrote: >> Steven Whitehouse wrote: >>> Hi, >>> >>> Since Linus released -rc1 when I was on holiday last week, I've just >>> updated the git tree. There have been quite a lot of changes in the core >>> kernel which affect GFS2 but which were not part of the GFS2 -nmw git >>> tree in this last merge window, so thats another reason for updating so >>> that we have all those changes in the -nmw tree now, >>> >>> Steve. >>> >> Hi Steve, >> >> I am still getting a bunch of OOPS at mount time with this very latest tree. >> >> Last commit in -nwm: 9904cd0ecaf9c72de837214fa1b359d4c87220a8 >> >> dmesg in attachement. >> >> Cheers >> Fabio >> > This is a bit odd. I've been trying to track this down, but I've not > been able to reproduce it here. The odd thing is that it appears from > the context that the spin lock has been unlocked by something inside the > call to run_queue() (from gfs2_glmutex_unlock()) but I can't spot where > that has happened. Could it be related to the setup I am using or the kernel config? This is a testbed so I can do virtually everything to debug and test. The device is one 5 GB aoe disk. The cluster is a 3 node vmware workstation machines. The config is a simple as it can be to run such cluster. > Even more odd, the original report hit a different area of the code: the > check for the spinlock in gfs2_demote_wake(). The reason that particular > one stood out is that according to the stack trace it was called from > drop_bh() where the preceding function call was to lock that very lock. > So in that case it looks like a race with something else getting to the > glock's spin lock in the space of the call to that function. If you are > running with spin lock debugging, that means that the spin lock must > have been initialised correctly, otherwise that would have flagged up > earlier, so it looks like its been overwritten, or that the glock has > been freed underneath it, both of which events should be impossible. I did try to look at the code with very little luck. I could smell the race as i found the lock right before entering the code the OOPS but I have no idea what could have unlock it. I am traveling until the 11th of Nov with very little network connectivity, but the next steps are to try a clean 2.6.24-* without -nwm and start bisecting the tree to see when this problem has been introduced. Assuming it's not aoe causing it. > > So I'm still looking into this, but slightly confused at the moment. Has > anybody else seen anything similar? I still can't figure out why you see > this and I don't, Not that I know off. I doubt there are that many people testing gfs2 "crack of the day" as the only reason I was there was to test gfs1 and told myself: what the heck, let's give it a spin. Speaking of which gfs1 works fine so at least the locking code in gfs2 that is shared between the two looks good. Fabio