* [PATCH] notes on volatile write caches vs fdatasync
@ 2009-08-27 1:16 Christoph Hellwig
2009-08-27 1:19 ` Christoph Hellwig
2009-08-27 13:02 ` Jan Kara
0 siblings, 2 replies; 5+ messages in thread
From: Christoph Hellwig @ 2009-08-27 1:16 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
Cc: chris.mason, jack, tytso, adilger, swhiteho, konishi.ryusuke,
mfasheh, joel.becker
There are two related issues when dealing with volatile write caches,
the popular and beaten to death one are write barriers to guarantee
write ordering and stable storage for log writes. For this post
I assume naively this works perfectly for all filesystems supporting it.
The second issue are plain cache flush. Yes, they happen to be the
base for the barrier implementation on all common disks in Linux, but
there are cases where we need to issue them even without a log barrier.
Think about a plain write into a file that is already fully allocated.
Or the O_DIRECT version of them same. If we do an fdatasync after these
we really do expect the write to really be on disk, not just in the disk
cache, right? The same is true for O_SYNC, but I ignore it for this
write out as with Jan's patch series O_SYNC writes will be implemented
by a range-fdatasync after the actual write, so after that this sync
section covers it, too.
It appears the following Linux filesystems implement barrier support:
- btrfs
- ext3
- ext4
- gfs2
- nilfs2
- ocfs2
- reiserfs
- xfs
Interestingly of those only ext4, reiserfs and xfs do contain direct
calls to blkdev_issue_flush. And unless a filesystem really creates
a transaction for every write and forces that out on fdatasync it seems
like all others do not actually have a chance to guarantee a cache
flush on fdatasync.
I have tested btrfs, ext3, ext4, reiserfs, and xfs with a simple test
program that just does a buffered write into a file, and then calls
fdatasync. All of the above filesystems issue a barrier request
when the file blocks aren't allocated yet (for ext3 and reiserfs
only when barriers are explicitly enabled, of course).
That's not the case anymore when all blocks are already allocated.
As expected by the above grep results reiserfs and xfs still issue a
barrier in that case. Btrfs also performs a cache flush in every
case which at first seems unexpected due to the lack of any
blkdev_issue_flush call, but given that btrfs is a COW filesystem
it actually has to allocate blocks even for an overwrite.
Ext3 expectedly does not issue a cache flush in that case, but ext4
unexpectedly does not issue a cache flush either. The reason for that
is that it only issues the cache flush if the inode was dirty but
not at all if that is not the case.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] notes on volatile write caches vs fdatasync
2009-08-27 1:16 [PATCH] notes on volatile write caches vs fdatasync Christoph Hellwig
@ 2009-08-27 1:19 ` Christoph Hellwig
2009-08-27 13:02 ` Jan Kara
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2009-08-27 1:19 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
Cc: chris.mason, jack, tytso, adilger, swhiteho, konishi.ryusuke,
mfasheh, joel.becker
No actually a patch, sorry ;-)
On Thu, Aug 27, 2009 at 03:16:24AM +0200, Christoph Hellwig wrote:
> There are two related issues when dealing with volatile write caches,
> the popular and beaten to death one are write barriers to guarantee
> write ordering and stable storage for log writes. For this post
> I assume naively this works perfectly for all filesystems supporting it.
>
> The second issue are plain cache flush. Yes, they happen to be the
> base for the barrier implementation on all common disks in Linux, but
> there are cases where we need to issue them even without a log barrier.
>
> Think about a plain write into a file that is already fully allocated.
> Or the O_DIRECT version of them same. If we do an fdatasync after these
> we really do expect the write to really be on disk, not just in the disk
> cache, right? The same is true for O_SYNC, but I ignore it for this
> write out as with Jan's patch series O_SYNC writes will be implemented
> by a range-fdatasync after the actual write, so after that this sync
> section covers it, too.
>
> It appears the following Linux filesystems implement barrier support:
>
> - btrfs
> - ext3
> - ext4
> - gfs2
> - nilfs2
> - ocfs2
> - reiserfs
> - xfs
>
> Interestingly of those only ext4, reiserfs and xfs do contain direct
> calls to blkdev_issue_flush. And unless a filesystem really creates
> a transaction for every write and forces that out on fdatasync it seems
> like all others do not actually have a chance to guarantee a cache
> flush on fdatasync.
>
> I have tested btrfs, ext3, ext4, reiserfs, and xfs with a simple test
> program that just does a buffered write into a file, and then calls
> fdatasync. All of the above filesystems issue a barrier request
> when the file blocks aren't allocated yet (for ext3 and reiserfs
> only when barriers are explicitly enabled, of course).
>
> That's not the case anymore when all blocks are already allocated.
> As expected by the above grep results reiserfs and xfs still issue a
> barrier in that case. Btrfs also performs a cache flush in every
> case which at first seems unexpected due to the lack of any
> blkdev_issue_flush call, but given that btrfs is a COW filesystem
> it actually has to allocate blocks even for an overwrite.
> Ext3 expectedly does not issue a cache flush in that case, but ext4
> unexpectedly does not issue a cache flush either. The reason for that
> is that it only issues the cache flush if the inode was dirty but
> not at all if that is not the case.
---end quoted text---
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] notes on volatile write caches vs fdatasync
2009-08-27 1:16 [PATCH] notes on volatile write caches vs fdatasync Christoph Hellwig
2009-08-27 1:19 ` Christoph Hellwig
@ 2009-08-27 13:02 ` Jan Kara
2009-08-27 18:49 ` Christoph Hellwig
1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2009-08-27 13:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, linux-kernel, chris.mason, jack, tytso, adilger,
swhiteho, konishi.ryusuke, mfasheh, joel.becker
Hi,
On Thu 27-08-09 03:16:24, Christoph Hellwig wrote:
> There are two related issues when dealing with volatile write caches,
> the popular and beaten to death one are write barriers to guarantee
> write ordering and stable storage for log writes. For this post
> I assume naively this works perfectly for all filesystems supporting it.
>
> The second issue are plain cache flush. Yes, they happen to be the
> base for the barrier implementation on all common disks in Linux, but
> there are cases where we need to issue them even without a log barrier.
>
> Think about a plain write into a file that is already fully allocated.
> Or the O_DIRECT version of them same. If we do an fdatasync after these
> we really do expect the write to really be on disk, not just in the disk
> cache, right? The same is true for O_SYNC, but I ignore it for this
> write out as with Jan's patch series O_SYNC writes will be implemented
> by a range-fdatasync after the actual write, so after that this sync
> section covers it, too.
I've noticed this as well when we were tracking some problems Pavel
Machek found with his USB stick. I even wrote a patch at the time
http://osdir.com/ml/linux-ext4/2009-01/msg00015.html
but it somehow died out. Now, the situation should be simpler with
fsync paths cleaned up... BTW: People wanted this to be configurable per
block device which probably makes sence...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: notes on volatile write caches vs fdatasync
2009-08-27 13:02 ` Jan Kara
@ 2009-08-27 18:49 ` Christoph Hellwig
2009-08-27 19:26 ` Jeff Garzik
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2009-08-27 18:49 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-kernel, chris.mason, tytso, adilger,
swhiteho, konishi.ryusuke, mfasheh, joel.becker
On Thu, Aug 27, 2009 at 03:02:52PM +0200, Jan Kara wrote:
> I've noticed this as well when we were tracking some problems Pavel
> Machek found with his USB stick. I even wrote a patch at the time
> http://osdir.com/ml/linux-ext4/2009-01/msg00015.html
> but it somehow died out. Now, the situation should be simpler with
> fsync paths cleaned up... BTW: People wanted this to be configurable per
> block device which probably makes sence...
Yeah, that patch is pretty ugly. We need to do these cache flushes
in ->fsync (and ->sync_fs if any filesystem really doesn't guarantee to
issue transaction there after data has been written). Adding it
to simple_fsync too sounds good to me.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: notes on volatile write caches vs fdatasync
2009-08-27 18:49 ` Christoph Hellwig
@ 2009-08-27 19:26 ` Jeff Garzik
0 siblings, 0 replies; 5+ messages in thread
From: Jeff Garzik @ 2009-08-27 19:26 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-fsdevel, linux-kernel, chris.mason, tytso,
adilger, swhiteho, konishi.ryusuke, mfasheh, joel.becker
On 08/27/2009 02:49 PM, Christoph Hellwig wrote:
> On Thu, Aug 27, 2009 at 03:02:52PM +0200, Jan Kara wrote:
>> I've noticed this as well when we were tracking some problems Pavel
>> Machek found with his USB stick. I even wrote a patch at the time
>> http://osdir.com/ml/linux-ext4/2009-01/msg00015.html
>> but it somehow died out. Now, the situation should be simpler with
>> fsync paths cleaned up... BTW: People wanted this to be configurable per
>> block device which probably makes sence...
>
> Yeah, that patch is pretty ugly. We need to do these cache flushes
> in ->fsync (and ->sync_fs if any filesystem really doesn't guarantee to
> issue transaction there after data has been written). Adding it
> to simple_fsync too sounds good to me.
Agreed. That was the direction I was heading with my patch[1]. Last
feedback I got on that was needing to add a knob to optionally disable
this new cache-flush behavior.
Jeff
[1] http://lkml.org/lkml/2009/3/27/366
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-08-27 19:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-27 1:16 [PATCH] notes on volatile write caches vs fdatasync Christoph Hellwig
2009-08-27 1:19 ` Christoph Hellwig
2009-08-27 13:02 ` Jan Kara
2009-08-27 18:49 ` Christoph Hellwig
2009-08-27 19:26 ` Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).