From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] fallocate vs O_(D)SYNC
Date: Wed, 16 Nov 2011 11:20:58 +0000 [thread overview]
Message-ID: <1321442458.2713.34.camel@menhir> (raw)
In-Reply-To: <20111116105413.GA2916@quack.suse.cz>
Hi,
On Wed, 2011-11-16 at 11:54 +0100, Jan Kara wrote:
> Hello,
>
> On Wed 16-11-11 09:43:08, Steven Whitehouse wrote:
> > On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> > > It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> > > make sure the size update transaction made it to disk.
> > >
> > > Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> > > operation (it adds new blocks that return zeroes) that seems like a
> > > fairly nasty surprise for O_SYNC users.
> >
> > In GFS2 we zero out the data blocks as we go (since our metadata doesn't
> > allow us to mark blocks as zeroed at alloc time) and also because we are
> > mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
> > on our rindex system file in order to ensure that there is always enough
> > space to expand a filesystem.
> >
> > So there is no danger of having non-zeroed blocks appearing later, as
> > that is done before the metadata change.
> >
> > Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
> > call, so that fsync should pick that up and ensure that the metadata has
> > been written back. So we should thus have both data and metadata stable
> > on disk.
> >
> > Do you have some evidence that this is not happening?
> Yeah, only that nobody calls that fsync() automatically if the fd is
> O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> most filesystems? That would match how we treat O_SYNC for other operations
> as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> with this.
>
> Honza
Ah, I see now. Sorry, I missed the original point. So that would just be
a VFS addition to check the O_(D)SYNC flag as you suggest. I've no
objections to that, it makes sense to me,
Steve.
next prev parent reply other threads:[~2011-11-16 11:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-16 8:42 [Cluster-devel] fallocate vs O_(D)SYNC Christoph Hellwig
2011-11-16 9:43 ` Steven Whitehouse
2011-11-16 10:54 ` Jan Kara
2011-11-16 11:20 ` Steven Whitehouse [this message]
2011-11-16 12:45 ` Christoph Hellwig
2011-11-16 13:39 ` Jan Kara
2011-11-16 13:42 ` Christoph Hellwig
2011-11-16 15:57 ` Jan Kara
2011-11-16 16:16 ` Christoph Hellwig
[not found] ` <20111116161806.GP29279@shiny>
[not found] ` <20111116193540.GL23779@wotan.suse.de>
[not found] ` <20111116200310.GN23779@wotan.suse.de>
2011-11-17 10:16 ` Joel Becker
2011-11-18 12:09 ` Steven Whitehouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1321442458.2713.34.camel@menhir \
--to=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).