linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: O_DIRECT and barriers
Date: Fri, 21 Aug 2009 13:40:10 +0200	[thread overview]
Message-ID: <20090821114010.GG12579@kernel.dk> (raw)
In-Reply-To: <20090820221221.GA14440@infradead.org>

On Thu, Aug 20 2009, Christoph Hellwig wrote:
> Btw, something semi-related I've been looking at recently:
> 
> Currently O_DIRECT writes bypass all kernel caches, but there they do
> use the disk caches.  We currenly don't have any barrier support for
> them at all, which is really bad for data integrity in virtualized
> environments.  I've started thinking about how to implement this.
> 
> The simplest scheme would be to mark the last request of each
> O_DIRECT write as barrier requests.  This works nicely from the FS
> perspective and works with all hardware supporting barriers.  It's
> massive overkill though - we really only need to flush the cache
> after our request, and not before.  And for SCSI we would be much
> better just setting the FUA bit on the commands and not require a
> full cache flush at all.
> 
> The next scheme would be to simply always do a cache flush after
> the direct I/O write has completed, but given that blkdev_issue_flush
> blocks until the command is done that would a) require everyone to
> use the end_io callback and b) spend a lot of time in that workque.
> This only requires one full cache flush, but it's still suboptimal.
> 
> I have prototypes this for XFS, but I don't really like it.
> 
> The best scheme would be to get some highlevel FUA request in the
> block layer which gets emulated by a post-command cache flush.

I've talked to Chris about this in the past too, but I never got around
to benchmarking FUA for O_DIRECT. It should be pretty easy to wire up
without making too many changes, and we do have FUA support on most SATA
drives too. Basically just a check in the driver for whether the
request is O_DIRECT and a WRITE, ala:

        if (rq_data_dir(rq) == WRITE && rq_is_sync(rq))
                WRITE_FUA;

I know that FUA is used by that other OS, so I think we should be golden
on the hw support side.

-- 
Jens Axboe


  reply	other threads:[~2009-08-21 11:40 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1250697884-22288-1-git-send-email-jack@suse.cz>
2009-08-20 22:12 ` O_DIRECT and barriers Christoph Hellwig
2009-08-21 11:40   ` Jens Axboe [this message]
2009-08-21 13:54     ` Jamie Lokier
2009-08-21 14:26       ` Christoph Hellwig
2009-08-21 15:24         ` Jamie Lokier
2009-08-21 17:45           ` Christoph Hellwig
2009-08-21 19:18             ` Ric Wheeler
2009-08-22  0:50             ` Jamie Lokier
2009-08-22  2:19               ` Theodore Tso
2009-08-22  2:31                 ` Theodore Tso
2009-08-24  2:34               ` Christoph Hellwig
2009-08-27 14:34                 ` Jamie Lokier
2009-08-27 17:10                   ` adding proper O_SYNC/O_DSYNC, was " Christoph Hellwig
2009-08-27 17:24                     ` Ulrich Drepper
2009-08-28 15:46                       ` Christoph Hellwig
2009-08-28 16:06                         ` Ulrich Drepper
2009-08-28 16:17                           ` Christoph Hellwig
2009-08-28 16:33                             ` Ulrich Drepper
2009-08-28 16:41                               ` Christoph Hellwig
2009-08-28 20:51                                 ` Ulrich Drepper
2009-08-28 21:08                                   ` Christoph Hellwig
2009-08-28 21:16                                     ` Trond Myklebust
2009-08-28 21:29                                       ` Christoph Hellwig
2009-08-28 21:43                                         ` Trond Myklebust
2009-08-28 22:39                                           ` Christoph Hellwig
2009-08-30 16:44                                     ` Jamie Lokier
2009-08-28 16:46                               ` Jamie Lokier
2009-08-29  0:59                                 ` Jamie Lokier
2009-08-28 16:44                         ` Jamie Lokier
2009-08-28 16:50                           ` Jamie Lokier
2009-08-28 21:08                           ` Ulrich Drepper
2009-08-30 16:58                             ` Jamie Lokier
2009-08-30 17:48                             ` Jamie Lokier
2009-08-28 23:06                         ` Jamie Lokier
2009-08-28 23:46                           ` Christoph Hellwig
2009-08-21 22:08         ` Theodore Tso
2009-08-21 22:38           ` Joel Becker
2009-08-21 22:45           ` Joel Becker
2009-08-22  2:11             ` Theodore Tso
2009-08-24  2:42               ` Christoph Hellwig
2009-08-24  2:37             ` Christoph Hellwig
2009-08-22  0:56           ` Jamie Lokier
2009-08-22  2:06             ` Theodore Tso
2009-08-26  6:34           ` Dave Chinner
2009-08-26 15:01             ` Jamie Lokier
2009-08-26 18:47               ` Theodore Tso
2009-08-27 14:50                 ` Jamie Lokier
2009-08-21 14:20     ` Christoph Hellwig
2009-08-21 15:06       ` James Bottomley
2009-08-21 15:23         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090821114010.GG12579@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).