From: Josef Bacik <josef@redhat.com>
To: "Cornelius, Martin (DWBI)" <Martin.Cornelius@smiths-heimann.com>
Cc: linux-fsdevel@vger.kernel.org, "Roeder,
Patrick (DWBI)" <Patrick.Roeder@smithsdetection.com>
Subject: Re: Question : are concurrent write() calls with O_APPEND on local files atomic ?
Date: Wed, 19 Aug 2009 09:17:49 -0400 [thread overview]
Message-ID: <20090819131749.GA7761@localhost.localdomain> (raw)
In-Reply-To: <531F9EE7AD1E874595D59997FD3EAEED052EB15C@COSSMGMBX05.EMAIL.CORP.TLD>
On Wed, Aug 19, 2009 at 06:40:33AM -0600, Cornelius, Martin (DWBI) wrote:
>
> Hi linux-filesystem experts
>
> First, please apologize if this is the wrong place to ask this question
> -- we googled around a lot and couldn't find an answer, that's why we
> finally try it here.
>
> The actual cause of the question is our reasoning about the robustness
> of the openssh code. Every invocation of ssh possibly adds a line to the
> file $(HOME)/.ssh/known_hosts, and (contrary to our expectations) we
> couldn't find any explicit locking in the code. Instead, the ssh code
> just opens the file with O_APPEND, writes to the file, and closes it. We
> already conducted a simple test that tries to create a 'corrupted'
> known_host files by starting lots of ssh commands concurrently, but so
> far we could not observe corruption. We now wonder if this is just by
> luck or if a programmer can rely on this behaviour.
>
> The generalized question is: If two (or more) different processes open
> the same file on a !LOCAL! disk with O_APPEND, and then concurrently
> issue write() calls to store data into this file, is there any guarantee
> that the data of each single write() call are written 'atomically', or
> could it happen that the data of different write()s are mangled or one
> write() overwrites data already written ? To prevent misunderstandings,
> we assume that ALL writers have opended the file with O_APPEND, and all
> write calls return normally without being interrupted by a signal.
>
So looking at the code, with O_APPEND set, every time the app calls write() the
position it's writing to is set to the end of the file. It looks like most
people (with the exception of btrfs) will be holding the inode->i_mutex when
they do a generic_write_checks, which gives the position to write to. So the
position to write to and then the subsequent writing are atomic, so unless the
fs is btrfs (which may or may not be a bug, I'll leave that to the smarter
people), O_APPEND should appear to be atomic.
> The Posix standard states that adavancing the filepointer to the end of
> the file and the following execution of the write are performed
> atomically with O_APPEND, but as far as we grasp it does not state if
> the actual write is also atomic w.r.t. other concurrrent write calls.
>
> If there is some guarantee :
> - does a (perhaps filesystem dependent) limit for this guarantee exist ?
> (like the PIPE_BUF size limit when writing to a pipe), and is there a
> way to detect this limit programmatically ?
Like I said, it seems most people hold the i_mutex when doing the check, but it
appears btrfs does not. I think it's a bug, but I'm not sure. There would not
be a way to tell programmatically.
> - does this guarantee also hold, if several threads in one process write
> to a single file DESCRIPTOR concurrently ?
Yes, the position is set every single time write() is called.
> - does this guarantee also hold for remote filesystems (nfs / smb) ?
>
This I'm more likely to be wrong on, but I don't think so. It would be atomic
on the local machine, but if there is somebody else on another machine writing
to the same file I think you would probably be screwed.
> If the answer to the last question is 'no' : is there a simple way to
> programmatically detect whether the guarantee holds for a specific file
> ?
I don't think so. Really your best bet if you are going to do a remote fs that
can have concurrent writers that have no knowledge of eachother is to use fcntl.
Thanks,
Josef
next prev parent reply other threads:[~2009-08-19 13:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-19 12:40 Question : are concurrent write() calls with O_APPEND on local files atomic ? Cornelius, Martin (DWBI)
2009-08-19 13:17 ` Josef Bacik [this message]
2009-08-20 2:10 ` Andreas Dilger
2009-08-20 12:28 ` Trond Myklebust
2009-08-20 14:50 ` AW: Question : are concurrent write() calls with O_APPEND on localfiles " Cornelius, Martin (DWBI)
2009-08-20 15:02 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090819131749.GA7761@localhost.localdomain \
--to=josef@redhat.com \
--cc=Martin.Cornelius@smiths-heimann.com \
--cc=Patrick.Roeder@smithsdetection.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox