From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bob Peterson <rpeterso@redhat.com>
Date: Mon, 25 Mar 2019 11:29:17 -0400 (EDT)
Subject: [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <=
 sdp->sd_jdesc->jd_blocks
In-Reply-To: <06bfe342-bebb-e464-4759-eedc641db017@citrix.com>
References: <fe00b6ce-7fb9-8769-0bc5-e830cebb1e75@citrix.com>
	<745330930.13931854.1553021309310.JavaMail.zimbra@redhat.com>
	<06bfe342-bebb-e464-4759-eedc641db017@citrix.com>
Message-ID: <1543432043.15176772.1553527757339.JavaMail.zimbra@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

----- Original Message -----
> I think I've found the cause of the assertion I was hitting. Recovery
> sets sd_log_flush_head but does not take locks which means a concurrent
> call to gfs2_log_flush() can result in sd_log_head being set to
> sd_log_flush_head. A later call to gfs2_log_flush() will then hit an
> assertion failure in log_pull_tail() because the mismatch between
> sd_log_head and sd_log_tail means too many blocks are freed.
> 
> I've worked around it by taking the log_flush lock in the patch below
> and it seems to avoid the problem. However, tracing the recovery process
> I see that it sets sd_log_flush_head and then calls clean_journal() ->
> gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This
> has:
> 
> 	BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
> 	       (sdp->sd_log_flush_head != sdp->sd_log_head));
> 
> ... but sd_log_tail and sd_log_head have not been set by
> gfs2_recover_func() so it might still BUG_ON() during recovery if you're
> particularly unlucky.
> 
> I had a look at your "GFS2: Withdraw corruption patches [V2]" series but
> I didn't see anything that might fix this.
> 
> If you think this patch is useful then I can send it as a proper patch
> to the list.
> 
> Thanks,
> Ross
> 
> --------------
> 
> gfs2: Take log_flush lock during recovery
> 
> Recovery sets sd_log_flush_head but does not take any locks which means
> a concurrent call to gfs2_log_flush can result in sd_log_head being set
> to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
> assertion failure in log_pull_tail because the mismatch between
> sd_log_head and sd_log_tail means too many blocks are freed.
> 
> gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion
> "atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
> function = log_pull_tail, file = fs/gfs2/log.c, line = 510

Hi Ross,

I think you found a valid bug, but that's not the proper solution.
The reason is: I think journal replay and journal flushing should both be
protected by the exclusive (EX) glock taken on the journal itself.

I think the problem may actually be a regression with patch 588bff95c94ef.
Because of that patch, function clean_journal now sets sdp->sd_log_flush_head
but its caller, gfs2_recover_func, is used to recover any node's journal, not
just its own. The bug is that clean_journal should only set sd_log_flush_head
if (and only if) it's replaying its own journal, not someone else's.
If it sets sd_log_flush_head while replaying another node's journal, that
will only lead to a problem like this.

I'll try and whip up another patch and perhaps you can test it for me.

FWIW, I've never seen this problem manifest on my recovery tests, but it
still might be causing some of the weird problems I'm seeing.

Regards,

Bob Peterson
Red Hat File Systems