From: Nick Piggin <npiggin@suse.de>
To: Jamie Lokier <jamie@shareable.org>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: [rfc] fsync_range?
Date: Wed, 21 Jan 2009 02:29:00 +0100 [thread overview]
Message-ID: <20090121012900.GD24891@wotan.suse.de> (raw)
In-Reply-To: <20090120183120.GD27464@shareable.org>
On Tue, Jan 20, 2009 at 06:31:21PM +0000, Jamie Lokier wrote:
> Nick Piggin wrote:
> > Just wondering if we should add an fsync_range syscall like AIX and
> > some BSDs have? It's pretty simple for the pagecache since it
> > already implements the full sync with range syncs anyway. For
> > filesystems and user programs, I imagine it is a bit easier to
> > convert to fsync_range from fsync rather than use the sync_file_range
> > syscall.
> >
> > Having a flags argument is nice, but AIX seems to use O_SYNC as a
> > flag, I wonder if we should follow?
>
> I like the idea. It's much easier to understand than sync_file_range,
> whose man page doesn't really explain how to use it correctly.
>
> But how is fsync_range different from the sync_file_range syscall with
> all its flags set?
sync_file_range would have to wait, then write, then wait. It also
does not call into the filesystem's ->fsync function, I don't know
what the wider consequences of that are for all filesystems, but
for some it means that metadata required to read back the data is
not synced properly, and often it means that metadata sync will not
work.
Filesystems could also much more easily get converted to a ->fsync_range
function if that would be beneficial to any of them.
> For database writes, you typically write a bunch of stuff in various
> regions of a big file (or multiple files), then ideally fdatasync
> some/all of the written ranges - with writes committed to disk in the
> best order determined by the OS and I/O scheduler.
Do you know which databases do this? It will be nice to ask their
input and see whether it helps them (I presume it is an OSS database
because the "big" ones just use direct IO and manage their own
buffers, right?)
Today, they will have to just fsync the whole file. So they first must
identify which parts of the file need syncing, and then gather those
parts as a vector.
> For this, taking a vector of multiple ranges would be nice.
> Alternatively, issuing parallel fsync_range calls from multiple
> threads would approximate the same thing - if (big if) they aren't
> serialised by the kernel.
I was thinking about doing something like that, but I just wanted to
get basic fsync_range... OTOH, we could do an fsyncv syscall and gcc
could implement fsync_range on top of that?
next prev parent reply other threads:[~2009-01-21 1:29 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-20 16:47 [rfc] fsync_range? Nick Piggin
2009-01-20 18:31 ` Jamie Lokier
2009-01-20 21:25 ` Bryan Henderson
2009-01-20 22:42 ` Jamie Lokier
2009-01-21 19:43 ` Bryan Henderson
2009-01-21 21:08 ` Jamie Lokier
2009-01-21 22:44 ` Bryan Henderson
2009-01-21 23:31 ` Jamie Lokier
2009-01-21 1:36 ` Nick Piggin
2009-01-21 19:58 ` Bryan Henderson
2009-01-21 20:53 ` Jamie Lokier
2009-01-21 22:14 ` Bryan Henderson
2009-01-21 22:30 ` Jamie Lokier
2009-01-22 1:52 ` Bryan Henderson
2009-01-22 3:41 ` Jamie Lokier
2009-01-21 1:29 ` Nick Piggin [this message]
2009-01-21 3:15 ` Jamie Lokier
2009-01-21 3:48 ` Nick Piggin
2009-01-21 5:24 ` Jamie Lokier
2009-01-21 6:16 ` Nick Piggin
2009-01-21 11:18 ` Jamie Lokier
2009-01-21 11:41 ` Nick Piggin
2009-01-21 12:09 ` Jamie Lokier
2009-01-21 4:16 ` Nick Piggin
2009-01-21 4:59 ` Jamie Lokier
2009-01-21 6:23 ` Nick Piggin
2009-01-21 12:02 ` Jamie Lokier
2009-01-21 12:13 ` Theodore Tso
2009-01-21 12:37 ` Jamie Lokier
2009-01-21 14:12 ` Theodore Tso
2009-01-21 14:35 ` Chris Mason
2009-01-21 15:58 ` Eric Sandeen
2009-01-21 20:41 ` Jamie Lokier
2009-01-21 21:23 ` jim owens
2009-01-21 21:59 ` Jamie Lokier
2009-01-21 23:08 ` btrfs O_DIRECT was " jim owens
2009-01-22 0:06 ` Jamie Lokier
2009-01-22 13:50 ` jim owens
2009-01-22 21:18 ` Florian Weimer
2009-01-22 21:23 ` Florian Weimer
2009-01-21 3:25 ` Jamie Lokier
2009-01-21 3:52 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090121012900.GD24891@wotan.suse.de \
--to=npiggin@suse.de \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).