linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Joshi <joshiiitr@gmail.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs log write design
Date: Fri, 31 Aug 2018 10:33:25 +1000	[thread overview]
Message-ID: <20180831003325.GH5631@dastard> (raw)
In-Reply-To: <CA+1E3rJCdcLJVYrj89H1HQDr7ETpZzQzueb6oqg3eJnNZ_q+UQ@mail.gmail.com>

On Thu, Aug 30, 2018 at 09:57:50PM +0530, Joshi wrote:
> This must be novice topic for the list, please excuse the ignorance.
> 
> When it comes to log write scheme in XFS, I wonder if I can draw
> parallel with any journaling mode of Ext4 (ordered or writeback)?

Neither, really.

The journal records metadata operations in the order they occur but
not data operations (like ext4 does in writeback mode), but XFS uses an
"update on IO completion" model for data-related metadata operations
such that the observable filesystem behaviour of ext4's ordered
mode behaviour ended up being very similar to XFS's behaviour.

[ Keep in mind that ext4 ordered mode is not the same as ext3 ordered
mode - ext4's behaviour is a hybrid writeback model because of
delayed allocation and not wanting the ext3 sync-the-world fsync()
problem. Another thing to keep in mind is that ext4 copied a fair
number of XFS behaviours to avoid data loss in delayed allocation
crash situations after ext4 "rediscovered" all the issues fixed
in XFS over 10+ years of using delayed allocation. ]

> I checked xlog_sync() code, and found that each log IO is marked with
> PREFLUSH and FUA (for internal log case).
> This perhaps makes it similar to "ordered" journal mode of Ext4.

No, the flushes have nothing to do with the this. They are about
ensuring completion-to-submission IO ordering constraints are
enforced at the storage level.

> But I am not sure about exact intent of choosing PREFLUSH for log
> write

If we don't flush the cache prior to the log write and we crash, the
log write might be on stable storage, but metadata we've written
back and told has been complete may not be. i.e. the log write
can overwrite metadta in the log may not be on stable storage if we
don't do a pre-flush to ensure completion-to-submission ordering is
enforced right down to the stable media.

> i.e.whether it is for all previous non-log IO (meta and data) or
> only for meta-IO. Nor I am sure whether xfs makes a conscious choice
> to issue data writes before meta or journal I/Os.

XFS does not control the order of data writes except for a few
corner cases where expeditious writing of data masks common
application-level data integrity bugs (typically unsafe overwrite
operations).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      reply	other threads:[~2018-08-31  4:38 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30 16:27 xfs log write design Joshi
2018-08-31  0:33 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180831003325.GH5631@dastard \
    --to=david@fromorbit.com \
    --cc=joshiiitr@gmail.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).