From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bob Peterson <rpeterso@redhat.com>
Date: Tue, 19 Mar 2019 14:48:29 -0400 (EDT)
Subject: [Cluster-devel] Assertion failure: sdp->sd_log_blks_free
	<=	sdp->sd_jdesc->jd_blocks
In-Reply-To: <fe00b6ce-7fb9-8769-0bc5-e830cebb1e75@citrix.com>
References: <fe00b6ce-7fb9-8769-0bc5-e830cebb1e75@citrix.com>
Message-ID: <745330930.13931854.1553021309310.JavaMail.zimbra@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

----- Original Message -----
> Hi,
> 
> Occasionally during testing, we see the following assertion failure in
> log_pull_tail():
> 
> [ 1104.061245] gfs2: fsid=xapi-clusterd:2d2cc24c-c48a-ca.0: fatal:
> assertion "atomic_read(&sdp->sd_log_blks_free) <=
> sdp->sd_jdesc->jd_blocks" failed
> [ 1104.061245]    function = log_pull_tail, file = fs/gfs2/log.c, line = 510
> 
> It always seems to happen shortly after journal recovery. I added some
> debug logging at the point of the assertion failure and got the following:
(snip) 
> Any ideas about this?
> 
> Thanks,
> --
> Ross Lagerwall

Hi Ross,

I've been fighting with/debugging multiple recovery problems for a long time now.
I've done countless (well, okay, thousands of) recovery tests and I can tell
you that gfs2 recovery has some major problems. These problems usually don't
occur when you have a few gfs2 mounts: they're much more likely when you have
lots of gfs2 mounts. I'm using 32 mounts.

The problem you mentioned sounds vaguely familiar, but I can't find anything
directly related. Make sure all your journals are the same size, and see if fsck.gfs2
complains about the journal. Otherwise, it could be a side effect of one of the
recovery issues I'm working on. Do you have other symptoms? Also, make sure
multiple nodes aren't trying to use the same journal because of lock_nolock or
something...I've made that mistake in the past.

Regards,

Bob Peterson
Red Hat File Systems