From: Shan Hai <shan.hai@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH RFC 0/8] xfs: introduce inode data inline feature
Date: Fri, 6 Jul 2018 15:11:19 +0800 [thread overview]
Message-ID: <4e1632e5-d470-bf90-88f5-b82e4e18e7f9@oracle.com> (raw)
In-Reply-To: <85b8d2ed-5749-c357-96c3-0f6d0d424d6e@oracle.com>
On 2018年07月06日 14:39, Shan Hai wrote:
>
>
> On 2018年07月06日 13:42, Dave Chinner wrote:
>> On Fri, Jul 06, 2018 at 11:12:21AM +0800, Shan Hai wrote:
>>> This series implements xfs inode data inlining feature.
>> Cool. I was actually thinking about this earlier this week. :)
>>
>
> Yes, this is a nice feature, and thanks for your previous suggestions
> in the below link,
> which saved my time a lot :)
>
>>> Refered below link during development:
>>> https://marc.info/?l=linux-xfs&m=120493585731509&w=2
>>>
>>>
>>> How it works:
>>> - the data inlining happens at:
>>> write_iter: for DIO/DAX write
>>> writeback: for buffered write
>>> - extents to local format conversion is done in writeback but not in
>>> write_iter
>>> - local to extents format conversion is done in the write_iter and
>>> writeback
>> The problem with the way local to extents conversion is imlpemented
>> in this patchset is that we do not synchronise data writeback with
>> journal commits and log recovery. The local to extents conversion
>> appears to be done like so:
>>
>> ----
>> writeback arrives in local format, data in extent cached page
>> convert inode to extent format
>> allocate new block
>> commit allocation and inode format conversion
>> submit data IO to newly allocated block.
>> ----
>>
>> This works when nothing goes wrong, but it's not failsafe. The
>> problem is that we've logged the inode data fork conversion before
>> we've completed writing the data to the new block. IOWs, we can end
>> up in the state where the active journal recoery window does not contain
>> the data in the inode and the data has not yet reached the new
>> allocated block on disk, either.
>>
>> If we crash in this window, we lose the data that was in the inode -
>> log recovery finishes with the inode in extent form pointing to
>> uninitialised data blocks. i.e. not only is it a data loss event,
>> it's also a security issue because it exposes stale data.
>>
>> This has been the unsolved problem from all previous attempts to
>> implement inline data in XFS. I actually outlined this problem and
>> ways to solve it in the mailing list post linked above, but it
>> doesn't appear that either mechanism was implemented in this
>> patchset.
>>
>> However, unlike that post from 2008, we now have infrastructure that
>> can be used to solve to this problem: the data path copy-on-write
>> mechanism. The COW mechanism is essentially a generic
>> implementation of the "Method 1: use an intent/done transaction
>> pair" solution I describe in the above post.
>>
>> This mechanism solves the above problem by storing the newly
>> allocated block in the in-memory COW fork and doesn't modify the
>> data fork until after the data write IO completes. IOWs, we do
>> allocation before the write IO, and do the data fork and BMBT
>> manipulation after the IO completes. i.e. we do the local->extent
>> data fork modification at IO completion using the extent that was
>> stored in the in memory COW fork.
>
> Yes, this can fix the race totally, I will try to implement it in the
> subsequent series.
>
> Or how about below? It's similar to what we did in the
> xfs_setattr_size, but the sync
> write in this situation is ugly I know :)
>
> ----
> write arrives at local inode
> allocate a page and insert it into page cache, copy the data from data
> fork to the page
> filemap_write_and_wait_range
Oh no, please ignore this, it would never can work, I will try the data
path COW idea.
Thanks
Shan Hai
>
> xfs_trans_roll
> convert inode to extent format
> allocate a new block
> commit allocation and inode format conversion
> ----
>
> Thanks
> Shan Hai
>> Hence I think the inline data write path needs to piggy back on the
>> iomap COW path that we use for writing to shared extents if the
>> write would cause a data fork format change.
>>
>> Cheers,
>>
>> Dave.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-07-06 7:11 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-06 3:12 [PATCH RFC 0/8] xfs: introduce inode data inline feature Shan Hai
2018-07-06 3:12 ` [PATCH RFC 1/8] xfs: introduce inline data superblock feature bit Shan Hai
2018-07-06 3:34 ` Darrick J. Wong
2018-07-06 4:06 ` Shan Hai
2018-07-06 3:12 ` [PATCH RFC 2/8] xfs: introduce extents to local conversion helper Shan Hai
2018-07-06 3:45 ` Darrick J. Wong
2018-07-06 4:15 ` Shan Hai
2018-07-08 15:42 ` Christoph Hellwig
2018-07-09 1:58 ` Shan Hai
2018-07-06 3:12 ` [PATCH RFC 3/8] xfs: convert inode from extents to local format Shan Hai
2018-07-06 3:47 ` Darrick J. Wong
2018-07-06 4:24 ` Shan Hai
2018-07-06 3:12 ` [PATCH RFC 4/8] xfs: implement inline data read write code Shan Hai
2018-07-06 3:33 ` Darrick J. Wong
2018-07-06 4:05 ` Shan Hai
2018-07-08 15:45 ` Christoph Hellwig
2018-07-09 2:08 ` Shan Hai
2018-07-06 3:12 ` [PATCH RFC 5/8] xfs: consider the local format inode in misc operations Shan Hai
2018-07-06 3:40 ` Darrick J. Wong
2018-07-06 4:40 ` Shan Hai
2018-07-08 15:51 ` Christoph Hellwig
2018-07-09 3:06 ` Shan Hai
2018-07-06 3:12 ` [PATCH RFC 6/8] xfs: fix imbalanced locking Shan Hai
2018-07-08 15:53 ` Christoph Hellwig
2018-07-09 3:07 ` Shan Hai
2018-07-06 3:12 ` [PATCH RFC 7/8] xfs: return non-zero blocks for inline data Shan Hai
2018-07-08 15:54 ` Christoph Hellwig
2018-07-11 13:08 ` Carlos Maiolino
2018-07-12 1:03 ` Shan Hai
2018-07-12 1:13 ` Shan Hai
2018-07-12 1:31 ` Darrick J. Wong
2018-07-12 1:46 ` Shan Hai
2018-07-12 9:08 ` Carlos Maiolino
2018-07-12 10:48 ` Shan Hai
2018-07-13 12:39 ` Carlos Maiolino
2018-07-17 13:57 ` Christoph Hellwig
2018-07-18 15:03 ` Carlos Maiolino
2018-07-06 3:12 ` [PATCH RFC 8/8] xfs: skip local format inode for reflinking Shan Hai
2018-07-06 3:26 ` Darrick J. Wong
2018-07-06 3:54 ` Shan Hai
2018-07-08 16:00 ` Christoph Hellwig
2018-07-06 3:12 ` [PATCH RFC 1/1] xfsprogs: add inode inline data support Shan Hai
2018-07-06 3:35 ` Darrick J. Wong
2018-07-06 19:14 ` Eric Sandeen
2018-07-06 3:51 ` [PATCH RFC 0/8] xfs: introduce inode data inline feature Darrick J. Wong
2018-07-06 4:09 ` Shan Hai
2018-07-06 5:42 ` Dave Chinner
2018-07-06 6:39 ` Shan Hai
2018-07-06 7:11 ` Shan Hai [this message]
2018-07-08 15:58 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4e1632e5-d470-bf90-88f5-b82e4e18e7f9@oracle.com \
--to=shan.hai@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).