linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Jeff Garzik <jeff@garzik.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Chris Wedgwood <cw@f00f.org>
Subject: Re: Proposal for "proper" durable fsync() and fdatasync()
Date: Wed, 27 Feb 2008 14:16:46 +0000	[thread overview]
Message-ID: <20080227141646.GA22850@shareable.org> (raw)
In-Reply-To: <47C45267.4090105@garzik.org>

Jeff Garzik wrote:
> >It's not optimal even then.
> >
> >  Devices: On a software RAID, you ideally don't want to issue flushes
> >  to all drives if your database did a 1 block commit entry.  (But they
> >  probably use O_DIRECT anyway, changing the rules again).  But all that
> >  can be optimised in generic VFS code eventually.  It doesn't need
> >  filesystem assistance in most cases.
> 
> My own idea is that we create a FLUSH command for blkdev request queues, 
> to exist alongside READ, WRITE, and the current barrier implementation. 
>  Then FLUSH could be passed down through MD or DM.

I like your thought, and it has the benefit of being simple.

My thought is very similar, but with (hopefully not premature...)
optimisations:

  - I would merge FLUSH with a preceding write in some cases,
    converting to an FUA-write command.  Probably the generic request
    queue is the best place to detect and merge.  This is so that
    userspace filesystems (including guest VMs) and databases can do
    journal commits with the same I/O sequence as in kernel
    filesystems.

  - I would create BARRIER too, so that a userspace API can ask for
    this weaker form of fsync, which may improve throughput of
    userspace journalling.  

  - I would include a sector range in FLUSH and BARRIER, for MD and DM
    to flush _only_ relevant sub-devices.  This may improve performance
    for journalling both kernel and userspace filesystems, as journal
    commits are often very small and hit one or two sub-devices in RAID.

  - I would ask the nice MD and DM people to take tag-barriers rather
    than flush-barriers on the input queue, converting to
    tag-barriers, flush-barriers and independent FLUSH on the
    sub-device queues according to sector ranges and subsequent
    writes.  It's not obvious, but my barrier proposal which started
    this thread is designed to support an efficient inter-sub-device
    flush-barrier when necessary, and single-sub-device tag-barrier
    when possible.

-- Jamie

  reply	other threads:[~2008-02-27 14:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-26  7:26 Proposal for "proper" durable fsync() and fdatasync() Jamie Lokier
2008-02-26  7:43 ` Andrew Morton
2008-02-26  7:59   ` Jamie Lokier
2008-02-26  9:16     ` Nick Piggin
2008-02-26 14:09       ` Jörn Engel
2008-02-26 15:07         ` Jamie Lokier
2008-02-26 16:27           ` Andrew Morton
2008-02-26 15:28         ` Jamie Lokier
2008-02-26 17:02           ` Jörn Engel
2008-02-26 17:29             ` Jamie Lokier
2008-02-26 17:38               ` Jörn Engel
2008-02-26 16:43       ` Jeff Garzik
2008-02-26 17:00         ` Jamie Lokier
2008-02-26 17:54           ` Jeff Garzik
2008-02-27 14:16             ` Jamie Lokier [this message]
2008-02-26  7:43 ` Jeff Garzik
2008-02-26  7:55   ` Jamie Lokier
2008-02-26  9:25   ` Jamie Lokier
2008-02-26 12:13   ` Ric Wheeler
2008-02-26 15:43     ` Jamie Lokier
2008-11-24 21:10       ` Sachin Gaikwad
2008-11-25 10:17         ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080227141646.GA22850@shareable.org \
    --to=jamie@shareable.org \
    --cc=akpm@linux-foundation.org \
    --cc=cw@f00f.org \
    --cc=jeff@garzik.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).