From: Theodore Ts'o <tytso@mit.edu>
To: "Sidorov, Andrei" <Andrei.Sidorov@arrisi.com>
Cc: "Joseph D. Wagner" <joe@josephdwagner.info>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
Ryan Lortie <desrt@desrt.ca>
Subject: Re: ext4 file replace guarantees
Date: Sat, 22 Jun 2013 08:56:04 -0400 [thread overview]
Message-ID: <20130622125604.GD4727@thunk.org> (raw)
In-Reply-To: <C0F0BC787567C848B2C90989451123DA2363CAFF@ATLEXMBX4.ARRS.ARRISI.com>
On Fri, Jun 21, 2013 at 09:49:26PM +0000, Sidorov, Andrei wrote:
> But there is no need to mount entire fs with data journalling mode.
> In fact I find per-file data journalling extremely useful. It would
> be even more useful if it allowed regular users to set journalling
> mode on specific file and there was some way to designate rewrite
> transaction boundaries (even 128k would cover a lot of
> small-but-important-file use cases).
Note that at the moment, the +j flag is only honored in nodelalloc
mode. Since delayed allocation is enabled by defalut the per-file
data journal flag is ignored. This is something that we could fix, in
theory. It would be possible to teach ext4_writepages how to allocate
the block(s) and write the data block(s) in the same journal
transaction --- but that functionality does not exist today.
So if you want to use the +j flag, you have to mount the file system
with the non-standard nodelalloc mount option. And that's actually
sufficient to be bug-for-bug compatible with ext3 in terms of the
commit of the transaction which contains the rename operation first
forcing the file out to disk first.
Although as both I and Dave Chinner have pointed out, it's a bad idea
for generic application to depend on file system implementation,
because we do reserve the right to change those implementation details
if it would help improve the file system's performance or reliability.
> As for now it is a best choice for app running with root privileges
> for rewriting files <= page size.
The best choice for an application rewriting files <= a single 4k
block is to use O_DIRECT to rewrite the contents of the file, using a
4k buffer which is zero padded. This is the most performant, uses the
fewest write cycles for a SSD, etc.
- Ted
next prev parent reply other threads:[~2013-06-22 12:56 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-20 21:34 ext4 file replace guarantees Ryan Lortie
2013-06-21 0:59 ` Theodore Ts'o
2013-06-21 12:43 ` Ryan Lortie
2013-06-21 13:15 ` Theodore Ts'o
2013-06-21 13:51 ` Ryan Lortie
2013-06-21 14:33 ` Theodore Ts'o
2013-06-21 15:24 ` Ryan Lortie
2013-06-21 20:35 ` Theodore Ts'o
2013-06-22 3:29 ` Dave Chinner
2013-06-22 4:47 ` Theodore Ts'o
2013-06-22 13:40 ` Sidorov, Andrei
2013-06-22 14:06 ` Theodore Ts'o
2013-06-22 14:41 ` Sidorov, Andrei
2013-06-23 1:58 ` Dave Chinner
2013-06-21 16:25 ` Joseph D. Wagner
2013-06-21 21:05 ` Theodore Ts'o
2013-06-21 21:49 ` Sidorov, Andrei
2013-06-22 12:56 ` Theodore Ts'o [this message]
2013-06-22 14:01 ` Sidorov, Andrei
2013-06-22 14:30 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130622125604.GD4727@thunk.org \
--to=tytso@mit.edu \
--cc=Andrei.Sidorov@arrisi.com \
--cc=desrt@desrt.ca \
--cc=joe@josephdwagner.info \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.