From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Christoph Hellwig <hch@infradead.org>,
Jamie Lokier <jamie@shareable.org>
Cc: mtk.manpages@gmail.com, Heinrich Schuchardt <xypron.glpk@gmx.de>,
linux-man@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
Theodore T'so <tytso@mit.edu>,
Linux-Fsdevel <linux-fsdevel@vger.kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH] fsync_range, was: Re: munmap, msync: synchronization
Date: Wed, 23 Apr 2014 16:33:06 +0200 [thread overview]
Message-ID: <5357CF22.2090900@gmail.com> (raw)
In-Reply-To: <20140422092837.GA6191@infradead.org>
On 04/22/2014 11:28 AM, Christoph Hellwig wrote:
> On Tue, Apr 22, 2014 at 08:04:21AM +0100, Jamie Lokier wrote:
>> Hi Christoph,
>>
>> Hardly research, I just did a quick Google and was surprised to find
>> some results. AIX API differs from the BSDs; the BSDs seem to agree
>> with each other. fsync_range(), with a flag parameter saying what type
>> of sync, and whether it flushes the storage device write cache as well
>> (because they couldn't agree that was good - similar to the barriers
>> debate).
>
> There is no FreeBSD implementation, I think you were confused by FreeBSD
> also hosting NetBSD man pages on their site, just as I initially was.
>
> The APIs are mostly the same, except that AIX reuses O_ flags as
> argument and NetBSD has a separate namespace. Following the latter
> seems more sensible, and also allows developer to define the separate
> name to the O_ flag for portability.
>
>> As for me doing it, no, sorry, I haven't touched the kernel in a few
>> years, life's been complicated for non-technical reasons, and I don't
>> have time to get back into it now.
>
> I've cooked up a patch, but I really need someone to test it and promote
> it. Find the patch attached. There are two differences to the NetBSD
> one:
>
> 1) It doesn't fail for read-only FDs. fsync doesn't, and while
> standards used to have fdatasync and aio_fsync fail for them,
> Linux never did and the standards are catching up:
>
> http://austingroupbugs.net/view.php?id=501
> http://austingroupbugs.net/view.php?id=671
>
> 2) I don't implement the FDISKSYNC. Requiring it is utterly broken,
> and we wouldn't even have the infrastructure for it. It might make
> sense to provide it defined to 0 so that we have the identifier but
> make it a no-op.
>
>> In the kernel, I was always under the impression the simple part of
>> fsync_range - writing out data pages - was solved years ago, but being
>> sure the filesystem's updated its metadata in the proper way, that
>> begs for a little research into what filesystems do when asked,
>> doesn't it?
>
> The filesystems I care about handle it fine, and while I don't know
> the details of others they better handle it properly, given that we
> use vfs_fsync_range to implement O_SNYC/O_DSYNC writes and commits
> from the nfs server.
The functionality sounds like it would be worthwhile. I've applied the
patch against 3.15-rc2, and employed the test program below, with test
files on standard laptop HDD (ext4). The test program repeatedly
a) overwrites a specified region of a file
b) does an fsync_range() on a specified range of the file (need not be
the same region that was written).
The CLI is crude, but the arguments are:
1: pathname
2: number of loops
3: Starting point for writes each time round loop
4: Length of region to write
5: Either 'f' for or 'd' for FDATASYNC
6: start offset for fsync_range()
7: length for fsync_range()
It seems that the patch does roughly what it says on the tin:
# Precreate a 1MB file
$ sync; time ./t_fsync_range /testfs/f 100 0 1000000 d 0 1000000^C
$ dd of=/testfs/f bs=1000 count=1000 if=/dev/full
1000+0 records in
1000+0 records out
1000000 bytes (1.0 MB) copied, 0.00575843 s, 174 MB/s
# Take journaling and atime out of the equation:
$ sudo umount /dev/sdb6
$ sudo tune2fs -O ^has_journal /dev/sdb6$
[sudo] password for mtk:
tune2fs 1.42.8 (20-Jun-2013)
$ sudo mount -o norelatime,strictatime /dev/sdb6 /testfs
# Filesystem unmounted and remounted (with above options) before
# each of the following tests
===
# 1000 loops, writing 1 MB, syncing entire 1MB range, with FFILESYNC:
$ time ./t_fsync_range /testfs/f 1000 0 1000000 f 0 1000000
fsync_range(3, 0x20, 0, 1000000)
Performed 16000 writes
Performed 1000 sync operations
real 0m10.677s
user 0m0.011s
sys 0m0.816s
# 1000 loops, writing 1MB, syncing entire 1MB range, with FDATASYNC:
# (Takes less time, as expected)
$ time ./t_fsync_range /testfs/f 1000 0 1000000 d 0 1000000
fsync_range(3, 0x10, 0, 1000000)
Performed 16000 writes
Performed 1000 sync operations
real 0m8.685s
user 0m0.017s
sys 0m0.825s
===
# 1000 loops, writing 1 MB, syncing just 100kB, with FFILESYNC:
# (Take less time than syncing entire 1MB range, as expected)
$ time ./t_fsync_range /testfs/f 1000 0 1000000 f 0 100000
fsync_range(3, 0x20, 0, 100000)
Performed 16000 writes
Performed 1000 sync operations
real 0m1.501s
user 0m0.005s
sys 0m0.339s
# 1000 loops, writing 1 MB, syncing just 10kB, with FFILESYNC:
$ time ./t_fsync_range /testfs/f 1000 0 1000000 f 0 10000
fsync_range(3, 0x20, 0, 10000)
Performed 16000 writes
Performed 1000 sync operations
real 0m0.616s
user 0m0.004s
sys 0m0.240s
=======
But I have a question:
When I precreate a 10MB file, and repeat the tests (this time with
100 loops), I no longer see any significant difference between
FFILESYNC and FDATASYNC. What am I missing? Sample runs here,
though I did the tests repeatedly with broadly similar results
each time:
#FFILESYNC
$ time ./t_fsync_range /testfs/f 100 0 10000000 f 0 10000000
fsync_range(3, 0x20, 0, 10000000)
Performed 15300 writes
Performed 100 sync operations
real 0m17.575s
user 0m0.001s
sys 0m0.656s
# FDATASYNC
$ time ./t_fsync_range /testfs/f 100 0 10000000 d 0 10000000
fsync_range(3, 0x10, 0, 10000000)
Performed 15300 writes
Performed 100 sync operations
real 0m17.228s
user 0m0.005s
sys 0m0.624s
======
Add another question: is there any piece of sync_file_range()
functionality that could or should be incorporated in this API?
======
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
next prev parent reply other threads:[~2014-04-23 14:33 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-20 10:28 munmap, msync: synchronization Heinrich Schuchardt
2014-04-21 10:16 ` Michael Kerrisk (man-pages)
[not found] ` <5354F00E.8050609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-21 18:14 ` Christoph Hellwig
2014-04-21 19:54 ` Michael Kerrisk (man-pages)
2014-04-21 21:34 ` Jamie Lokier
[not found] ` <20140421213418.GH30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-22 6:03 ` Christoph Hellwig
2014-04-22 7:04 ` Jamie Lokier
2014-04-22 9:28 ` [PATCH] fsync_range, was: " Christoph Hellwig
2014-04-23 14:33 ` Michael Kerrisk (man-pages) [this message]
2014-04-23 15:45 ` Christoph Hellwig
[not found] ` <20140423154550.GA21014-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-04-23 22:20 ` Jamie Lokier
[not found] ` <20140423222011.GM30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-25 6:07 ` Christoph Hellwig
2014-04-24 9:34 ` Michael Kerrisk (man-pages)
[not found] ` <20140422092837.GA6191-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-04-23 22:15 ` Jamie Lokier
[not found] ` <20140423221402.GL30215-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>
2014-04-25 6:26 ` Christoph Hellwig
2014-04-24 1:34 ` Dave Chinner
2014-04-25 6:06 ` Christoph Hellwig
2014-04-23 14:03 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5357CF22.2090900@gmail.com \
--to=mtk.manpages@gmail.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=tytso@mit.edu \
--cc=xypron.glpk@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.