From: "Patrick J. LoPresti" <patl@curl.com>
To: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
Date: 15 Jul 2002 12:10:37 -0400 [thread overview]
Message-ID: <s5g4rf1t1j6.fsf@egghead.curl.com> (raw)
In-Reply-To: <mit.lcs.mail.linux-kernel/20020715151833.GA22828@merlin.emma.line.org>
Matthias Andree <matthias.andree@stud.uni-dortmund.de> writes:
> The data= mode was not part of the past discussion, that's why I
> brought this up now. However, reiserfs or ext3fs with data=writeback
> only journal the fsync() metadata involved, not the order of data
> (file contents) versus directory contents, so you can end up with a
> "crash - journal replay - file with bogus contents" scenario.
This should not happen with a properly written application. fsync()
flushes a bunch of stuff to disk, but it normally makes no promise
about the ORDER in which that stuff goes out. fsync() itself is how
application authors can enforce an ordering on disk operations.
For example, a typical MTA might follow this paradigm:
write temp file
fsync()
rename temp file to destination
fsync()
report success
(Yes, I know, "link/unlink" is more common in practice than rename().
But the principle is the same.)
Or, in the case of Postfix:
write message file
fsync()
chmod +x message file
fsync()
report success
The first paradigm uses the presence of a directory entry to represent
"committed" data. The second uses a mode bit on the file.
Both of these paradigms work fine with data=writeback. Yes, they
require calling fsync() twice, but that is exactly what you need to
enforce the ordering constraints!
An MTA has two ordering constraints:
1) Data must be flushed to disk before it is marked on disk as
"committed". This is to ensure that, after a crash, the MTA does
not read a corrupted mail file.
2) Data must be marked on disk as "committed" before a success code
is reported to the remote MTA. This is to ensure that no mail is
lost.
The ext3 data=ordered mode enforces the first constraint for mailers
using the "rename" paradigm, eliminating the need for the first
fsync() call. But any MTA which relies on data=ordered semantics is
not only Linux-specific, but ext3/reiserfs specific!
Synchronous directory updates, a la FFS, enforce the second constraint
(again for the "rename" paradigm), eliminating the need for the second
fsync().
But to be robust across platforms and file systems, a mailer needs
both fsync() calls. (On Linux, you actually need to fsync() the
*directory*, not the file, for the "rename" paradigm. It would be
nice if we could convince MTA authors to do this.)
> I don't think so. They'd rather declare ReiserFS unsupported and go with
> chattr +S. Seen that.
>
> New implementations (Courier's maildrop) still rely on BSD FFS
> "synchronous directory" semantics.
Are you sure? Because that is ridiculous... Modern BSDs like to use
"soft updates", which need that second fsync() to commit the metadata.
So as long as fsync() commits the journal, either paradigm above
should work fine under any journaling mode.
Summary: *All* MTAs should call fsync() twice. The only issue is what
descriptors they should call it on, exactly :-).
- Pat
next prev parent reply other threads:[~2002-07-15 16:07 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20020715075221.GC21470@uncarved.com>
2002-07-15 12:45 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Richard B. Johnson
2002-07-15 13:35 ` Matthias Andree
[not found] ` <mit.lcs.mail.linux-kernel/20020715133507.GF32155@merlin.emma.line.org>
2002-07-15 14:49 ` Patrick J. LoPresti
2002-07-15 15:18 ` Matthias Andree
[not found] ` <mit.lcs.mail.linux-kernel/20020715151833.GA22828@merlin.emma.line.org>
2002-07-15 16:10 ` Patrick J. LoPresti [this message]
2002-07-15 18:16 ` Matthias Andree
[not found] ` <mit.lcs.mail.linux-kernel/20020715181650.GA20665@merlin.emma.line.org>
2002-07-15 18:56 ` Patrick J. LoPresti
2002-07-15 20:50 ` Matthias Andree
2002-07-15 16:16 ` Alan Cox
2002-07-15 15:19 ` Matthias Andree
2002-07-15 16:45 ` Alan Cox
2002-07-15 15:38 ` Patrick J. LoPresti
2002-07-15 16:55 ` Alan Cox
2002-07-15 15:29 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine fordirectories " Sandy Harris
2002-07-15 20:17 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories " Patrick J. LoPresti
2002-07-16 1:40 ` jw schultz
2002-07-15 15:20 ` Bill Rugolsky Jr.
2002-07-15 15:35 ` Matthias Andree
2002-07-15 16:14 ` Bill Rugolsky Jr.
2002-07-09 13:49 Trond Myklebust
2002-07-09 14:06 ` Richard B. Johnson
2002-07-09 14:08 ` Trond Myklebust
2002-07-09 15:06 ` Richard B. Johnson
2002-07-09 16:56 ` Alan Cox
2002-07-09 17:22 ` Richard B. Johnson
2002-07-09 19:11 ` Alan Cox
2002-07-09 19:13 ` Richard B. Johnson
2002-07-09 19:59 ` Alan Cox
2002-07-09 19:50 ` Richard B. Johnson
2002-07-10 6:33 ` Alex Riesen
2002-07-10 11:20 ` Richard B. Johnson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=s5g4rf1t1j6.fsf@egghead.curl.com \
--to=patl@curl.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox