From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Fri, 22 Mar 2019 18:15:12 +0000
Subject: [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <=
 sdp->sd_jdesc->jd_blocks
In-Reply-To: <745330930.13931854.1553021309310.JavaMail.zimbra@redhat.com>
References: <fe00b6ce-7fb9-8769-0bc5-e830cebb1e75@citrix.com>
	<745330930.13931854.1553021309310.JavaMail.zimbra@redhat.com>
Message-ID: <06bfe342-bebb-e464-4759-eedc641db017@citrix.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On 3/19/19 6:48 PM, Bob Peterson wrote:
> ----- Original Message -----
>> Hi,
>>
>> Occasionally during testing, we see the following assertion failure in
>> log_pull_tail():
>>
>> [ 1104.061245] gfs2: fsid=xapi-clusterd:2d2cc24c-c48a-ca.0: fatal:
>> assertion "atomic_read(&sdp->sd_log_blks_free) <=
>> sdp->sd_jdesc->jd_blocks" failed
>> [ 1104.061245]    function = log_pull_tail, file = fs/gfs2/log.c, line = 510
>>
>> It always seems to happen shortly after journal recovery. I added some
>> debug logging at the point of the assertion failure and got the following:
> (snip)
>> Any ideas about this?
>>
>> Thanks,
>> --
>> Ross Lagerwall
> 
> Hi Ross,
> 
> I've been fighting with/debugging multiple recovery problems for a long time now.
> I've done countless (well, okay, thousands of) recovery tests and I can tell
> you that gfs2 recovery has some major problems. These problems usually don't
> occur when you have a few gfs2 mounts: they're much more likely when you have
> lots of gfs2 mounts. I'm using 32 mounts.
> 
> The problem you mentioned sounds vaguely familiar, but I can't find anything
> directly related. Make sure all your journals are the same size, and see if fsck.gfs2
> complains about the journal. Otherwise, it could be a side effect of one of the
> recovery issues I'm working on. Do you have other symptoms? Also, make sure
> multiple nodes aren't trying to use the same journal because of lock_nolock or
> something...I've made that mistake in the past.
> 

I think I've found the cause of the assertion I was hitting. Recovery 
sets sd_log_flush_head but does not take locks which means a concurrent 
call to gfs2_log_flush() can result in sd_log_head being set to 
sd_log_flush_head. A later call to gfs2_log_flush() will then hit an 
assertion failure in log_pull_tail() because the mismatch between 
sd_log_head and sd_log_tail means too many blocks are freed.

I've worked around it by taking the log_flush lock in the patch below 
and it seems to avoid the problem. However, tracing the recovery process 
I see that it sets sd_log_flush_head and then calls clean_journal() -> 
gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This 
has:

	BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
	       (sdp->sd_log_flush_head != sdp->sd_log_head));

... but sd_log_tail and sd_log_head have not been set by 
gfs2_recover_func() so it might still BUG_ON() during recovery if you're 
particularly unlucky.

I had a look at your "GFS2: Withdraw corruption patches [V2]" series but 
I didn't see anything that might fix this.

If you think this patch is useful then I can send it as a proper patch 
to the list.

Thanks,
Ross

--------------

gfs2: Take log_flush lock during recovery

Recovery sets sd_log_flush_head but does not take any locks which means
a concurrent call to gfs2_log_flush can result in sd_log_head being set
to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
assertion failure in log_pull_tail because the mismatch between
sd_log_head and sd_log_tail means too many blocks are freed.

gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion 
"atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
function = log_pull_tail, file = fs/gfs2/log.c, line = 510

diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 0f501f938d1c..5f7c3271e909 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -498,6 +498,8 @@ void gfs2_recover_func(struct work_struct *work)
  			goto fail_gunlock_thaw;
  		}

+		down_write(&sdp->sd_log_flush_lock);
+
  		t_tlck = ktime_get();
  		fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);

@@ -512,6 +514,8 @@ void gfs2_recover_func(struct work_struct *work)

  		clean_journal(jd, &head);

+		up_write(&sdp->sd_log_flush_lock);
+
  		gfs2_glock_dq_uninit(&thaw_gh);
  		t_rep = ktime_get();
  		fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, "