From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/5] xfs: log forces imply data device cache flushes
Date: Thu, 22 Jul 2021 16:13:46 -0700 [thread overview]
Message-ID: <20210722231346.GP559212@magnolia> (raw)
In-Reply-To: <20210722221258.GQ664593@dread.disaster.area>
On Fri, Jul 23, 2021 at 08:12:58AM +1000, Dave Chinner wrote:
> On Thu, Jul 22, 2021 at 12:30:18PM -0700, Darrick J. Wong wrote:
> > On Thu, Jul 22, 2021 at 11:53:34AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > After fixing the tail_lsn vs cache flush race, generic/482 continued
> > > to fail in a similar way where cache flushes were missing before
> > > iclog FUA writes. Tracing of iclog state changes during the fsstress
> >
> > Heh. ;)
> ....
> > > + * xlog_cil_force_seq() call, but there are other writers still
> > > + * accessing it so it hasn't been pushed to disk yet. Like the
> > > + * ACTIVE case above, we need to make sure caches are flushed
> > > + * when this iclog is written.
> > > + */
> > > + iclog->ic_flags |= XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA;
> > > + if (log_flushed)
> > > + *log_flushed = 1;
> > > + break;
> > > + default:
> > > + /*
> > > + * The entire checkpoint was written by the CIL force and is on
> > > + * it's way to disk already. It will be stable when it
> >
> > s/it's/its/
> >
> > So now that we're at the end of this series, what are the rules for when
> > when issue cache flushes and FUA writes?
> >
> > - Writing the unmount record always flushes the data and log devices.
> > Does it need to flush the rt device too? I guess xfs_free_buftarg
> > does that.
>
> Correct. RT device behaviour is unchanged as it only contains data
> and data is already guaranteed to be on stable storage before we
> write the unmount record.
>
> > - Start an async flush of the data device when doing CIL push work so
> > that anything the AIL wrote to disk (and pushed the tail) is persisted
> > before we assign a tail to the log record that we write at the end?
> >
> > - If any other AIL work completes (and pushes the tail ahead) by the
> > time we actually write the log record, flush the data device a second
> > time?
>
> Yes.
>
> > - If a log checkpoint spans multiple iclogs, flush the *log* device
> > before writing the iclog with the commit record in it.
>
> Yes. And for internal logs we have the natural optimisation that
> these two cases are handled by same cache flush and so for large
> checkpoints on internal logs we don't see lot tail update races.
>
> > - Any time we write an iclog that commits a checkpoint, write that
> > record with FUA to ensure it's persisted.
>
> *nod*
>
> > - If we're forcing the log to disk as part of an integrity operation
> > (fsync, syncfs, etc.) then issue cache flushes for ... each? iclog
> > written to disk? And use FUA for that write too?
>
> This is where it gets messy, because log forces are not based around
> checkpoint completions. Hence we have no idea what is actually in
> the iclog we are flushing and so must treat them all as if they
> contain a commit record, close off a multi-iclog checkpoint, and
> might have raced with a log tail update. We don't know - and can't
> know from the iclog state - which conditions exist and so we have to
> assume that at least one of the above states exist for any ACTIVE or
> WANT_SYNC iclog we end flushing or up waiting on.
>
> If the iclog is already on it's way to disk, and it contains a
> commit record, then the cache flush requirements for
> metadata/journal ordering have already been met and we don't need to
> do anything other than wait. But if we have to flush the iclog or
> wait for a flush by a third party, we need to ensure that cache
> flushes occur so that the log force semantics are upheld.
>
> If the iclog doesn't contain a commit record (i.e. a log force in
> the middle of a new, racing checkpoint write) we don't actually care
> if the iclog contains flushes or not, because a crash immediately
> after the log force won't actually recover the checkpoint contained
> in that iclog. From the log force perspective, the iclog contains
> future changes, so we don't care about whether it can be recovered.
> But we don't know this, so we have to issue cache flushes on every
> iclog we flush from the log force code.
>
> This is why I mentioned that the log force code needs to be turned
> inside out to guarantee CIL checkpoints are flushed and stable
> rather than iclogs. We care about whole checkpoints being
> recoverable, not whether some random iclog in the middle of a
> checkpoint write is stable....
<nod> Ok, that's kinda what I thought -- the first few "where do we
flush?" cases were pretty straightforward, and the last one is murky.
With all the random little changes applied,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2021-07-22 23:13 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-22 1:53 [PATCH 0/5] xfs: fix log cache flush regressions Dave Chinner
2021-07-22 1:53 ` [PATCH 1/5] xfs: flush data dev on external log write Dave Chinner
2021-07-22 6:41 ` Christoph Hellwig
2021-07-22 15:52 ` Darrick J. Wong
2021-07-22 1:53 ` [PATCH 2/5] xfs: external logs need to flush data device Dave Chinner
2021-07-22 6:48 ` Christoph Hellwig
2021-07-22 18:14 ` Darrick J. Wong
2021-07-22 21:45 ` Dave Chinner
2021-07-22 23:10 ` Darrick J. Wong
2021-07-22 1:53 ` [PATCH 3/5] xfs: fix ordering violation between cache flushes and tail updates Dave Chinner
2021-07-22 7:06 ` Christoph Hellwig
2021-07-22 7:28 ` Dave Chinner
2021-07-22 19:12 ` Darrick J. Wong
2021-07-22 1:53 ` [PATCH 4/5] xfs: log forces imply data device cache flushes Dave Chinner
2021-07-22 7:14 ` Christoph Hellwig
2021-07-22 7:32 ` Dave Chinner
2021-07-22 19:30 ` Darrick J. Wong
2021-07-22 22:12 ` Dave Chinner
2021-07-22 23:13 ` Darrick J. Wong [this message]
2021-07-22 1:53 ` [PATCH 5/5] xfs: avoid unnecessary waits in xfs_log_force_lsn() Dave Chinner
2021-07-22 7:15 ` Christoph Hellwig
2021-07-22 19:13 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210722231346.GP559212@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox