From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Tue, 19 Mar 2019 14:48:29 -0400 (EDT) Subject: [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks In-Reply-To: References: Message-ID: <745330930.13931854.1553021309310.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit ----- Original Message ----- > Hi, > > Occasionally during testing, we see the following assertion failure in > log_pull_tail(): > > [ 1104.061245] gfs2: fsid=xapi-clusterd:2d2cc24c-c48a-ca.0: fatal: > assertion "atomic_read(&sdp->sd_log_blks_free) <= > sdp->sd_jdesc->jd_blocks" failed > [ 1104.061245] function = log_pull_tail, file = fs/gfs2/log.c, line = 510 > > It always seems to happen shortly after journal recovery. I added some > debug logging at the point of the assertion failure and got the following: (snip) > Any ideas about this? > > Thanks, > -- > Ross Lagerwall Hi Ross, I've been fighting with/debugging multiple recovery problems for a long time now. I've done countless (well, okay, thousands of) recovery tests and I can tell you that gfs2 recovery has some major problems. These problems usually don't occur when you have a few gfs2 mounts: they're much more likely when you have lots of gfs2 mounts. I'm using 32 mounts. The problem you mentioned sounds vaguely familiar, but I can't find anything directly related. Make sure all your journals are the same size, and see if fsck.gfs2 complains about the journal. Otherwise, it could be a side effect of one of the recovery issues I'm working on. Do you have other symptoms? Also, make sure multiple nodes aren't trying to use the same journal because of lock_nolock or something...I've made that mistake in the past. Regards, Bob Peterson Red Hat File Systems