From: Chao Yu <chao@kernel.org>
To: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>, Theodore Ts'o <tytso@mit.edu>,
linux-f2fs-devel@lists.sourceforge.net,
linux-fsdevel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] f2fs: remove broken support for allocating DIO writes
Date: Tue, 3 Aug 2021 09:19:11 +0800 [thread overview]
Message-ID: <b88328b4-db3e-0097-d8cc-f250ee678e5b@kernel.org> (raw)
In-Reply-To: <YQg4Lukc2dXX3aJc@google.com>
On 2021/8/3 2:23, Jaegeuk Kim wrote:
> On 08/02, Chao Yu wrote:
>> On 2021/8/2 12:39, Eric Biggers wrote:
>>> On Fri, Jul 30, 2021 at 10:46:16PM -0400, Theodore Ts'o wrote:
>>>> On Fri, Jul 30, 2021 at 12:17:26PM -0700, Eric Biggers wrote:
>>>>>> Currently, non-overwrite DIO writes are fundamentally unsafe on f2fs as
>>>>>> they require preallocating blocks, but f2fs doesn't support unwritten
>>>>>> blocks and therefore has to preallocate the blocks as regular blocks.
>>>>>> f2fs has no way to reliably roll back such preallocations, so as a
>>>>>> result, f2fs will leak uninitialized blocks to users if a DIO write
>>>>>> doesn't fully complete.
>>>>
>>>> There's another way of solving this problem which doesn't require
>>>> supporting unwritten blocks. What a file system *could* do is to
>>>> allocate the blocks, but *not* update the on-disk data structures ---
>>>> so the allocation happens in memory only, so you know that the
>>>> physical blocks won't get used for another files, and then issue the
>>>> data block writes. On the block I/O completion, trigger a workqueue
>>>> function which updates the on-disk metadata to assign physical blocks
>>>> to the inode.
>>>>
>>>> That way if you crash before the data I/O has a chance to complete,
>>>> the on-disk logical block -> physical block map hasn't been updated
>>>> yet, and so you don't need to worry about leaking uninitialized blocks.
>>
>> Thanks for your suggestion, I think it makes sense.
>>
>>>>
>>>> Cheers,
>>>>
>>>> - Ted
>>>
>>> Jaegeuk and Chao, any idea how feasible it would be for f2fs to do this?
>>
>> Firstly, let's notice that below metadata will be touched during DIO
>> preallocation flow:
>> - log header
>> - sit bitmap/count
>> - free seg/sec bitmap/count
>> - dirty seg/sec bitmap/count
>>
>> And there is one case we need to concern about is: checkpoint() can be
>> triggered randomly in between dio_preallocate() and dio_end_io(), we should
>> not persist any DIO preallocation related metadata during checkpoint(),
>> otherwise, sudden power-cut after the checkpoint will corrupt filesytem.
>>
>> So it needs to well separate two kinds of metadata update:
>> a) belong to dio preallocation
>> b) the left one
>>
>> After that, it will simply checkpoint() flow to just flush metadata b), for
>> other flow, like GC, data/node allocation, it needs to query/update metadata
>> after we combine metadata a) and b).
>>
>> In addition, there is an existing in-memory log header framework in f2fs,
>> based on this fwk, it's very easy for us to add a new in-memory log header
>> for DIO preallocation.
>>
>> So it seems feasible for me until now...
>>
>> Jaegeuk, any other concerns about the implementation details?
>
> Hmm, I'm still trying to deal with this as a corner case where the writes
> haven't completed due to an error. How about keeping the preallocated block
> offsets and releasing them if we get an error? Do we need to handle EIO right?
What about the case that CP + SPO following DIO preallocation? User will
encounter uninitialized block after recovery.
Thanks,
>
>>
>> Thanks,
>>
>>>
>>> - Eric
>>>
next prev parent reply other threads:[~2021-08-03 1:19 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-28 1:51 [PATCH] f2fs: remove broken support for allocating DIO writes Eric Biggers
2021-07-30 19:17 ` Eric Biggers
2021-07-30 22:12 ` Jaegeuk Kim
2021-07-30 22:19 ` Eric Biggers
2021-07-31 1:05 ` Jaegeuk Kim
2021-07-31 1:18 ` Eric Biggers
2021-07-31 2:46 ` Theodore Ts'o
2021-08-02 4:39 ` Eric Biggers
2021-08-02 9:00 ` Chao Yu
2021-08-02 18:23 ` Jaegeuk Kim
2021-08-03 1:19 ` Chao Yu [this message]
2021-08-03 1:34 ` Jaegeuk Kim
2021-08-17 2:03 ` Eric Biggers
2021-08-17 5:42 ` Christoph Hellwig
2021-08-17 18:57 ` Jaegeuk Kim
2021-08-17 20:27 ` Eric Biggers
2021-08-17 21:33 ` Jaegeuk Kim
2021-08-18 0:06 ` Eric Biggers
2021-08-20 9:35 ` Chao Yu
2021-08-20 18:11 ` Eric Biggers
2021-08-20 22:01 ` Chao Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b88328b4-db3e-0097-d8cc-f250ee678e5b@kernel.org \
--to=chao@kernel.org \
--cc=ebiggers@kernel.org \
--cc=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).