linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shan Hai <shan.hai@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
Date: Fri, 6 Jul 2018 14:39:29 +0800	[thread overview]
Message-ID: <85b8d2ed-5749-c357-96c3-0f6d0d424d6e@oracle.com> (raw)
In-Reply-To: <20180706054250.GU2234@dastard>



On 2018年07月06日 13:42, Dave Chinner wrote:
> On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
>> This series implements xfs inode data inlining feature.
> Cool. I was actually thinking about this earlier this week. :)
>

Yes, this is a nice feature, and thanks for your previous suggestions in 
the below link,
which saved my time a lot :)

>> Refered below link during development:
>> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
>>
>>
>> How it works:
>> - the data inlining happens at:
>>    write_iter: for DIO/DAX write
>>    writeback: for buffered write
>> - extents to local format conversion is done in writeback but not in write_iter
>> - local to extents format conversion is done in the write_iter and writeback
> The problem with the way local to extents conversion is imlpemented
> in this patchset is that we do not synchronise data writeback with
> journal commits and log recovery.  The local to extents conversion
> appears to be done like so:
>
> ----
> writeback arrives in local format, data in extent cached page
> convert inode to extent format
> allocate new block
> commit allocation and inode format conversion
> submit data IO to newly allocated block.
> ----
>
> This works when nothing goes wrong, but it's not failsafe. The
> problem is that we've logged the inode data fork conversion before
> we've completed writing the data to the new block. IOWs, we can end
> up in the state where the active journal recoery window does not contain
> the data in the inode and the data has not yet reached the new
> allocated block on disk, either.
>
> If we crash in this window, we lose the data that was in the inode -
> log recovery finishes with the inode in extent form pointing to
> uninitialised data blocks. i.e. not only is it a data loss event,
> it's also a security issue because it exposes stale data.
>
> This has been the unsolved problem from all previous attempts to
> implement inline data in XFS. I actually outlined this problem and
> ways to solve it in the mailing list post linked above, but it
> doesn't appear that either mechanism was implemented in this
> patchset.
>
> However, unlike that post from 2008, we now have infrastructure that
> can be used to solve to this problem: the data path copy-on-write
> mechanism.  The COW mechanism is essentially a generic
> implementation of the "Method 1: use an intent/done transaction
> pair" solution I describe in the above post.
>
> This mechanism solves the above problem by storing the newly
> allocated block in the in-memory COW fork and doesn't modify the
> data fork until after the data write IO completes. IOWs, we do
> allocation before the write IO, and do the data fork and BMBT
> manipulation after the IO completes. i.e. we do the local->extent
> data fork modification at IO completion using the extent that was
> stored in the in memory COW fork.

Yes, this can fix the race totally, I will try to implement it in the 
subsequent series.

Or how about below? It's similar to what we did in the xfs_setattr_size, 
but the sync
write in this situation is ugly I know :)

----
write arrives at local inode
allocate a page and insert it into page cache, copy the data from data 
fork to the page
filemap_write_and_wait_range
xfs_trans_roll
convert inode to extent format
allocate a new block
commit allocation and inode format conversion
----

Thanks
Shan Hai
> Hence I think the inline data write path needs to piggy back on the
> iomap COW path that we use for writing to shared extents if the
> write would cause a data fork format change.
>
> Cheers,
>
> Dave.


  reply	other threads:[~2018-07-06  6:39 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-06  3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
2018-07-06  3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
2018-07-06  3:34   ` Darrick J. Wong
2018-07-06  4:06     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
2018-07-06  3:45   ` Darrick J. Wong
2018-07-06  4:15     ` Shan Hai
2018-07-08 15:42   ` Christoph Hellwig
2018-07-09  1:58     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 3/8] xfs: convert inode from extents to local format Shan Hai
2018-07-06  3:47   ` Darrick J. Wong
2018-07-06  4:24     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
2018-07-06  3:33   ` Darrick J. Wong
2018-07-06  4:05     ` Shan Hai
2018-07-08 15:45   ` Christoph Hellwig
2018-07-09  2:08     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
2018-07-06  3:40   ` Darrick J. Wong
2018-07-06  4:40     ` Shan Hai
2018-07-08 15:51   ` Christoph Hellwig
2018-07-09  3:06     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 6/8] xfs: fix imbalanced locking Shan Hai
2018-07-08 15:53   ` Christoph Hellwig
2018-07-09  3:07     ` Shan Hai
2018-07-06  3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
2018-07-08 15:54   ` Christoph Hellwig
2018-07-11 13:08   ` Carlos Maiolino
2018-07-12  1:03     ` Shan Hai
2018-07-12  1:13       ` Shan Hai
2018-07-12  1:31         ` Darrick J. Wong
2018-07-12  1:46           ` Shan Hai
2018-07-12  9:08             ` Carlos Maiolino
2018-07-12 10:48               ` Shan Hai
2018-07-13 12:39                 ` Carlos Maiolino
2018-07-17 13:57                   ` Christoph Hellwig
2018-07-18 15:03                     ` Carlos Maiolino
2018-07-06  3:12 ` [PATCH RFC 8/8] xfs: skip local format inode for reflinking Shan Hai
2018-07-06  3:26   ` Darrick J. Wong
2018-07-06  3:54     ` Shan Hai
2018-07-08 16:00     ` Christoph Hellwig
2018-07-06  3:12 ` [PATCH RFC 1/1] xfsprogs: add inode inline data support Shan Hai
2018-07-06  3:35   ` Darrick J. Wong
2018-07-06 19:14     ` Eric Sandeen
2018-07-06  3:51 ` [PATCH RFC 0/8] xfs: introduce inode data inline feature Darrick J. Wong
2018-07-06  4:09   ` Shan Hai
2018-07-06  5:42 ` Dave Chinner
2018-07-06  6:39   ` Shan Hai [this message]
2018-07-06  7:11     ` Shan Hai
2018-07-08 15:58   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85b8d2ed-5749-c357-96c3-0f6d0d424d6e@oracle.com \
    --to=shan.hai@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).