From: Brian Foster <bfoster@redhat.com>
To: Sweet Tea Dorminy <sweettea@permabit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: XFS journal write ordering constraints?
Date: Fri, 9 Jun 2017 13:30:52 -0400 [thread overview]
Message-ID: <20170609173049.GC10685@bfoster.bfoster> (raw)
In-Reply-To: <20170609123829.GA10685@bfoster.bfoster>
On Fri, Jun 09, 2017 at 08:38:32AM -0400, Brian Foster wrote:
> On Thu, Jun 08, 2017 at 11:42:11AM -0400, Sweet Tea Dorminy wrote:
> > Greetings;
> >
> > When using XFS with a 1k block size atop our device, we regularly get
> > "log record CRC mismatch"es when mounting XFS after a crash, and we
> > are attempting to understand why. We are using RHEL7.3 with its kernel
> > 3.10.0-514.10.2.el7.x86_64, xfsprogs version 4.5.0.
> >
> > Tracing indicates the following situation occurs:
> > Some pair of consecutive locations contains data A1 and B1, respectively.
> > The XFS journal issues new writes to those locations,
> > containing data A2 and B2.
> > The write of B' finishes, but A' is still outstanding at the
> > time of the crash.
> > Crash occurs. The data on disk is A1 and B2, respectively.
> > XFS fails to mount, complaining that the checksum mismatches.
> >
> > Does XFS expect sequentially issued journal IO to be committed to disk
> > in the order of issuance due to the use of FUA?
> >
>
> Hmm, I don't believe there is any such sequential I/O ordering
> constraint, but the log is complex and I could be missing something. We
> do have higher level ordering rules in various places. For example,
> commit records are written to the in-core logs in order. It also looks
> like in-core log I/O completion takes explicit measures to process
> callbacks in order in the event that the associated I/Os do not complete
> in order. That tends to imply there is no explicit log I/O submission
> ordering in place.
>
> Of course, that also implies that log recovery should be able to handle
> this situation just the same. I'm not quite sure what the expected log
> recovery behavior is off the top of my head, but my initial guess would
> be that the log LSN stamping could help us identify the valid part of
> the log during head/tail discovery.
>
After digging a bit more into the log recovery code, this does actually
appear to be the case. The process of finding the head of the log at
mount time starts with a rough approximation of the head location based
on cycle numbers which are stamped into the first bytes of every sector
written to the log. From there, it searches a previous number of blocks
based on the maximum log buffer concurrency allowed by the fs to
determine whether any such "holes" exist in that range. If so, the head
is walked back to the first instance of such a "hole," effectively
working around out of order buffer completion at the time of a
filesystem crash.
This basically means that such ranges are not part of the active log to
be recovered and thus should not lead to CRC errors. So if the
granularity of the ranges noted above is something like the size of a
log buffer and resides towards the end of the active log, it seems more
likely this could be expected behavior and not the source of the
problem. If the granularity is something smaller (i.e., a sector) it
seems more likely something is wrong beneath the filesystem, or if the
range is larger but much farther behind the head, then the problem could
be something else entirely.
(When looking through some of this, I also noticed that log recovery
leaks memory for partial transactions. Thanks! :P).
Brian
> Anyways, I think more information is required to try and understand what
> is happening in your situation. What is the xfs_info for this
> filesystem? What granularity are these A and B regions (sectors or
> larger)? Are you running on some kind of special block device that
> reproduces this? Do you have a consistent reproducer and/or have you
> reproduced on an upstream kernel? Could you provide an xfs_metadump
> image of the filesystem that fails log recovery with CRC errors?
>
> Brian
>
> > Thanks!
> >
> > Sweet Tea Dorminy
> > Permabit Technology Corporation
> > Cambridge, MA
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-06-09 17:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-08 15:42 XFS journal write ordering constraints? Sweet Tea Dorminy
2017-06-09 12:38 ` Brian Foster
2017-06-09 17:30 ` Brian Foster [this message]
2017-06-09 23:44 ` Dave Chinner
2017-06-10 2:06 ` Sweet Tea Dorminy
2017-06-12 14:55 ` Brian Foster
2017-06-12 16:18 ` Brian Foster
2017-06-15 22:28 ` Sweet Tea Dorminy
2017-06-16 13:42 ` Brian Foster
2017-06-12 23:50 ` Dave Chinner
2017-06-13 14:14 ` Sweet Tea Dorminy
2017-06-13 22:16 ` Dave Chinner
2017-06-14 6:46 ` Christoph Hellwig
2017-06-13 16:29 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170609173049.GC10685@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sweettea@permabit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).