From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks
Date: Mon, 25 Mar 2019 11:29:17 -0400 (EDT) [thread overview]
Message-ID: <1543432043.15176772.1553527757339.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <06bfe342-bebb-e464-4759-eedc641db017@citrix.com>
----- Original Message -----
> I think I've found the cause of the assertion I was hitting. Recovery
> sets sd_log_flush_head but does not take locks which means a concurrent
> call to gfs2_log_flush() can result in sd_log_head being set to
> sd_log_flush_head. A later call to gfs2_log_flush() will then hit an
> assertion failure in log_pull_tail() because the mismatch between
> sd_log_head and sd_log_tail means too many blocks are freed.
>
> I've worked around it by taking the log_flush lock in the patch below
> and it seems to avoid the problem. However, tracing the recovery process
> I see that it sets sd_log_flush_head and then calls clean_journal() ->
> gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This
> has:
>
> BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
> (sdp->sd_log_flush_head != sdp->sd_log_head));
>
> ... but sd_log_tail and sd_log_head have not been set by
> gfs2_recover_func() so it might still BUG_ON() during recovery if you're
> particularly unlucky.
>
> I had a look at your "GFS2: Withdraw corruption patches [V2]" series but
> I didn't see anything that might fix this.
>
> If you think this patch is useful then I can send it as a proper patch
> to the list.
>
> Thanks,
> Ross
>
> --------------
>
> gfs2: Take log_flush lock during recovery
>
> Recovery sets sd_log_flush_head but does not take any locks which means
> a concurrent call to gfs2_log_flush can result in sd_log_head being set
> to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
> assertion failure in log_pull_tail because the mismatch between
> sd_log_head and sd_log_tail means too many blocks are freed.
>
> gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion
> "atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
> function = log_pull_tail, file = fs/gfs2/log.c, line = 510
Hi Ross,
I think you found a valid bug, but that's not the proper solution.
The reason is: I think journal replay and journal flushing should both be
protected by the exclusive (EX) glock taken on the journal itself.
I think the problem may actually be a regression with patch 588bff95c94ef.
Because of that patch, function clean_journal now sets sdp->sd_log_flush_head
but its caller, gfs2_recover_func, is used to recover any node's journal, not
just its own. The bug is that clean_journal should only set sd_log_flush_head
if (and only if) it's replaying its own journal, not someone else's.
If it sets sd_log_flush_head while replaying another node's journal, that
will only lead to a problem like this.
I'll try and whip up another patch and perhaps you can test it for me.
FWIW, I've never seen this problem manifest on my recovery tests, but it
still might be causing some of the weird problems I'm seeing.
Regards,
Bob Peterson
Red Hat File Systems
next prev parent reply other threads:[~2019-03-25 15:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-19 18:17 [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks Ross Lagerwall
2019-03-19 18:48 ` Bob Peterson
2019-03-22 18:15 ` Ross Lagerwall
2019-03-25 15:29 ` Bob Peterson [this message]
2019-03-27 14:40 ` Ross Lagerwall
2019-03-27 17:04 ` Bob Peterson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1543432043.15176772.1553527757339.JavaMail.zimbra@redhat.com \
--to=rpeterso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).