From: Free Ekanayaka <free.ekanayaka@gmail.com>
To: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: direct I/O: ext4 seems to not honor RWF_DSYNC when journal is disabled
Date: Tue, 09 Jan 2024 15:57:19 +0000 [thread overview]
Message-ID: <877ckio5y8.fsf@x1.mail-host-address-is-not-set> (raw)
In-Reply-To: <20240109135950.wb2lyclqxvnfzwbk@quack3>
Jan Kara <jack@suse.cz> writes:
[...]
>> I suspect correct crash recovery behaviour here requires
>> multiple cache flushes to ensure the correct ordering or data vs
>> metadata updates. i.e:
>>
>> ....
>> data write completes
>> fdatasync()
>> cache flush to ensure data is on disk
>> if (dirty metadata) {
>> issue metadata write(s) for extent records and inode
>> ....
>> metadata write(s) complete
>> cache flush to ensure metadata is on disk
>> }
>>
>> If we don't flush the cache between the data write and the metadata
>> write(s) that marks the extent as written, we could have a state
>> after a power fail where the metadata writes hit the disk
>> before the data write and after the system comes back up that file
>> now it exposes stale data to the user.
>
> So when we are journalling, we end up doing this (we flush data disk before
> writing and flushing the transaction commit block in jbd2). When we are not
> doing journalling (which is the case here), our crash consistency
> guarantees are pretty weak. We want to guarantee that if fsync(2)
> successfully completed on the file before the crash, user should see the
> data there. But not much more - i.e., stale data exposure in case of crash
> is fully within what sysadmin should expect from a filesystem without a
> journal.
Right, which is exectly the tradeoff I need. Weaker guarantees for lower
latency.
All I need is that RWF_DSYNC holds up the promise that once I see a
successful io_uring completion entry, than I'm sure that the data has
made it to disk and it would survive a power loss.
> After all even if we improved fsync(2) as you suggest, we'd still
> have normal page writeback where we'd have to separate data & metadata
> writes with cache flushes and I don't think the performace overhead is
> something people would be willing to pay.
>
> So yes, nojournal mode is unsafe in case of crash. It is there for people
> not caring about the filesystem after the crash, single user filesystems
> doing data verification in userspace and similar special usecases. Still, I
> think we want at least minimum fsync(2) guarantees if nothing else for
> backwards compatibility with ext2.
I'm doing data verification in user space indeed. As sad, the file has
been pre-allocated posix_fallocate() and fsync'ed (along with its
dir), so no metadata changes will occur, just the bare write.
FWIW the use case is writing the log for an implementation of the Raft
consensus algorithm. So basically a series of sequential writes.
next prev parent reply other threads:[~2024-01-09 15:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87ttscddv4.fsf@x1.mail-host-address-is-not-set>
2023-09-06 20:15 ` direct I/O: ext4 seems to not honor RWF_DSYNC when journal is disabled Free Ekanayaka
2024-01-08 21:31 ` Jan Kara
2024-01-09 6:05 ` Dave Chinner
2024-01-09 13:59 ` Jan Kara
2024-01-09 15:57 ` Free Ekanayaka [this message]
2024-01-09 15:46 ` Free Ekanayaka
2024-01-10 10:01 ` Jan Kara
2024-01-10 11:19 ` Free Ekanayaka
2024-01-10 11:57 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877ckio5y8.fsf@x1.mail-host-address-is-not-set \
--to=free.ekanayaka@gmail.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox