linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: chris.mason@oracle.com, jack@suse.cz, tytso@mit.edu,
	adilger@sun.com, swhiteho@redhat.com,
	konishi.ryusuke@lab.ntt.co.jp, mfasheh@suse.com,
	joel.becker@oracle.com
Subject: Re: [PATCH] notes on volatile write caches vs fdatasync
Date: Thu, 27 Aug 2009 03:19:43 +0200	[thread overview]
Message-ID: <20090827011942.GA10541@lst.de> (raw)
In-Reply-To: <20090827011624.GA10405@lst.de>

No actually a patch, sorry ;-)

On Thu, Aug 27, 2009 at 03:16:24AM +0200, Christoph Hellwig wrote:
> There are two related issues when dealing with volatile write caches,
> the popular and beaten to death one are write barriers to guarantee
> write ordering and stable storage for log writes.  For this post
> I assume naively this works perfectly for all filesystems supporting it.
> 
> The second issue are plain cache flush.  Yes, they happen to be the
> base for the barrier implementation on all common disks in Linux, but
> there are cases where we need to issue them even without a log barrier.
> 
> Think about a plain write into a file that is already fully allocated.
> Or the O_DIRECT version of them same.  If we do an fdatasync after these
> we really do expect the write to really be on disk, not just in the disk
> cache, right?  The same is true for O_SYNC, but I ignore it for this
> write out as with Jan's patch series O_SYNC writes will be implemented
> by a range-fdatasync after the actual write, so after that this sync
> section covers it, too.
> 
> It appears the following Linux filesystems implement barrier support:
> 
>  - btrfs
>  - ext3
>  - ext4
>  - gfs2
>  - nilfs2
>  - ocfs2
>  - reiserfs
>  - xfs
> 
> Interestingly of those only ext4, reiserfs and xfs do contain direct
> calls to blkdev_issue_flush.  And unless a filesystem really creates
> a transaction for every write and forces that out on fdatasync it seems
> like all others do not actually have a chance to guarantee a cache
> flush on fdatasync.
> 
> I have tested btrfs, ext3, ext4, reiserfs, and xfs with a simple test
> program that just does a buffered write into a file, and then calls
> fdatasync.  All of the above filesystems issue a barrier request
> when the file blocks aren't allocated yet (for ext3 and reiserfs
> only when barriers are explicitly enabled, of course).
> 
> That's not the case anymore when all blocks are already allocated.
> As expected by the above grep results reiserfs and xfs still issue a
> barrier in that case.  Btrfs also performs a cache flush in every
> case which at first seems unexpected due to the lack of any
> blkdev_issue_flush call, but given that btrfs is a COW filesystem
> it actually has to allocate blocks even for an overwrite.
> Ext3 expectedly does not issue a cache flush in that case, but ext4
> unexpectedly does not issue a cache flush either.  The reason for that
> is that it only issues the cache flush if the inode was dirty but
> not at all if that is not the case.
---end quoted text---

  reply	other threads:[~2009-08-27  1:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-27  1:16 [PATCH] notes on volatile write caches vs fdatasync Christoph Hellwig
2009-08-27  1:19 ` Christoph Hellwig [this message]
2009-08-27 13:02 ` Jan Kara
2009-08-27 18:49   ` Christoph Hellwig
2009-08-27 19:26     ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090827011942.GA10541@lst.de \
    --to=hch@lst.de \
    --cc=adilger@sun.com \
    --cc=chris.mason@oracle.com \
    --cc=jack@suse.cz \
    --cc=joel.becker@oracle.com \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=swhiteho@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).