linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Shan Hai <shan.hai@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
Date: Fri, 6 Jul 2018 15:42:50 +1000	[thread overview]
Message-ID: <20180706054250.GU2234@dastard> (raw)
In-Reply-To: <1530846750-6686-1-git-send-email-shan.hai@oracle.com>

On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
> 
> This series implements xfs inode data inlining feature.

Cool. I was actually thinking about this earlier this week. :)

> Refered below link during development:
> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
> 
> 
> How it works:
> - the data inlining happens at:
>   write_iter: for DIO/DAX write
>   writeback: for buffered write
> - extents to local format conversion is done in writeback but not in write_iter
> - local to extents format conversion is done in the write_iter and writeback

The problem with the way local to extents conversion is imlpemented
in this patchset is that we do not synchronise data writeback with
journal commits and log recovery.  The local to extents conversion
appears to be done like so:

----
writeback arrives in local format, data in extent cached page
convert inode to extent format
allocate new block
commit allocation and inode format conversion
submit data IO to newly allocated block.
----

This works when nothing goes wrong, but it's not failsafe. The
problem is that we've logged the inode data fork conversion before
we've completed writing the data to the new block. IOWs, we can end
up in the state where the active journal recoery window does not contain
the data in the inode and the data has not yet reached the new
allocated block on disk, either.

If we crash in this window, we lose the data that was in the inode -
log recovery finishes with the inode in extent form pointing to
uninitialised data blocks. i.e. not only is it a data loss event,
it's also a security issue because it exposes stale data.

This has been the unsolved problem from all previous attempts to
implement inline data in XFS. I actually outlined this problem and
ways to solve it in the mailing list post linked above, but it
doesn't appear that either mechanism was implemented in this
patchset.

However, unlike that post from 2008, we now have infrastructure that
can be used to solve to this problem: the data path copy-on-write
mechanism.  The COW mechanism is essentially a generic
implementation of the "Method 1: use an intent/done transaction
pair" solution I describe in the above post.

This mechanism solves the above problem by storing the newly
allocated block in the in-memory COW fork and doesn't modify the
data fork until after the data write IO completes. IOWs, we do
allocation before the write IO, and do the data fork and BMBT
manipulation after the IO completes. i.e. we do the local->extent
data fork modification at IO completion using the extent that was
stored in the in memory COW fork.

Hence I think the inline data write path needs to piggy back on the
iomap COW path that we use for writing to shared extents if the
write would cause a data fork format change.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2018-07-06  5:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
2018-07-06  3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
2018-07-06  3:34   ` Darrick J. Wong
2018-07-06  4:06     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
2018-07-06  3:45   ` Darrick J. Wong
2018-07-06  4:15     ` Shan Hai
2018-07-08 15:42   ` Christoph Hellwig
2018-07-09  1:58     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 3/8] xfs: convert inode from extents to local format Shan Hai
2018-07-06  3:47   ` Darrick J. Wong
2018-07-06  4:24     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
2018-07-06  3:33   ` Darrick J. Wong
2018-07-06  4:05     ` Shan Hai
2018-07-08 15:45   ` Christoph Hellwig
2018-07-09  2:08     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
2018-07-06  3:40   ` Darrick J. Wong
2018-07-06  4:40     ` Shan Hai
2018-07-08 15:51   ` Christoph Hellwig
2018-07-09  3:06     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 6/8] xfs: fix imbalanced locking Shan Hai
2018-07-08 15:53   ` Christoph Hellwig
2018-07-09  3:07     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
2018-07-08 15:54   ` Christoph Hellwig
2018-07-11 13:08   ` Carlos Maiolino
2018-07-12  1:03     ` Shan Hai
2018-07-12  1:13       ` Shan Hai
2018-07-12  1:31         ` Darrick J. Wong
2018-07-12  1:46           ` Shan Hai
2018-07-12  9:08             ` Carlos Maiolino
2018-07-12 10:48               ` Shan Hai
2018-07-13 12:39                 ` Carlos Maiolino
2018-07-17 13:57                   ` Christoph Hellwig
2018-07-18 15:03                     ` Carlos Maiolino
2018-07-06  3:12 ` [PATCH RFC 8/8] xfs: skip local format inode for reflinking Shan Hai
2018-07-06  3:26   ` Darrick J. Wong
2018-07-06  3:54     ` Shan Hai
2018-07-08 16:00     ` Christoph Hellwig
2018-07-06  3:12 ` [PATCH RFC 1/1] xfsprogs: add inode inline data support Shan Hai
2018-07-06  3:35   ` Darrick J. Wong
2018-07-06 19:14     ` Eric Sandeen
2018-07-06  3:51 ` [PATCH RFC 0/8] xfs: introduce inode data inline feature Darrick J. Wong
2018-07-06  4:09   ` Shan Hai
2018-07-06  5:42 ` Dave Chinner [this message]
2018-07-06  6:39   ` Shan Hai
2018-07-06  7:11     ` Shan Hai
2018-07-08 15:58   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180706054250.GU2234@dastard \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=shan.hai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).