linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Ted Ts'o" <tytso@mit.edu>,
	hughd@google.com, hirofumi@mail.parknet.co.jp,
	swhiteho@redhat.com, linux-fsdevel@vger.kernel.org,
	jaxboe@fusionio.com, martin.petersen@oracle.com
Subject: Re: discard and barriers
Date: Sat, 14 Aug 2010 11:46:36 -0400	[thread overview]
Message-ID: <20100814154636.GP3315@think> (raw)
In-Reply-To: <20100814145210.GA23126@lst.de>

On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote:
> On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote:
> > Also, to be clear, the block layer will guarantee that a trim/discard
> > of block 12345 will not be reordered with respect to a write block
> > 12345, correct?
> 
> Right now that is what the hardbarrier does, and that's what we're
> trying to get rid of.

So btrfs will wait_on_{page/buffer/bio} to meet all ordering
requirements. This holds both for transaction commit and for discard.
Reiserfs has the exception you already know about.

> For XFS we prevent this by something that is
> called the busy extent list - extents delete by a transaction are
> inserted into it (it's actually a rbtree not a list these days),
> and before we can reuse blocks from it we need to ensure that it
> is fully commited.  discards only happen off that list and extents
> are only removed from it once the discard has finished.  I assume
> other filesystems have a similar mechanism.
> 
> > And on SATA devices, where discard requests are not queued requests,
> > the ata layer will have to do a queue flush *before* the discard is
> > sent, right?

Another way to say this is we have to be 100% sure that if we write
something after a discard, that storage will do that write after it does
the discard.

I'm not actually worried about writes before the discard, because the
worst case for us is the drive fails to discard something it could have
(this is the drive's problem).  Cache flushes from the FS will cover the
case where transaction commits depend on the data going in before the
discard. 

I care a lot about the write after the discards though.  If the discards
themselves become async, that's ok too as long as we have some way to do
end_io processing on them.

-chris


  reply	other threads:[~2010-08-14 15:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-14 11:56 discard and barriers Christoph Hellwig
2010-08-14 14:14 ` Ted Ts'o
2010-08-14 14:52   ` Christoph Hellwig
2010-08-14 15:46     ` Chris Mason [this message]
2010-08-14 17:22       ` Christoph Hellwig
2010-08-14 20:11       ` Hugh Dickins
2010-08-15 17:39     ` Ted Ts'o
2010-08-15 19:02       ` Christoph Hellwig
2010-08-15 21:25         ` Ted Ts'o
2010-08-15 21:30           ` Christoph Hellwig
2010-08-16  9:41     ` Steven Whitehouse
2010-08-16 11:26       ` Christoph Hellwig
2010-08-17 10:59         ` Steven Whitehouse
2010-08-23 16:42 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100814154636.GP3315@think \
    --to=chris.mason@oracle.com \
    --cc=hch@lst.de \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=hughd@google.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=swhiteho@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).