linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: Christoph Hellwig <hch@lst.de>
Cc: hughd@google.com, hirofumi@mail.parknet.co.jp,
	chris.mason@oracle.com, swhiteho@redhat.com,
	linux-fsdevel@vger.kernel.org, jaxboe@fusionio.com,
	martin.petersen@oracle.com
Subject: Re: discard and barriers
Date: Sun, 15 Aug 2010 13:39:06 -0400	[thread overview]
Message-ID: <20100815173906.GA20124@thunk.org> (raw)
In-Reply-To: <20100814145210.GA23126@lst.de>

On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote:
> On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote:
> > Also, to be clear, the block layer will guarantee that a trim/discard
> > of block 12345 will not be reordered with respect to a write block
> > 12345, correct?
> 
> Right now that is what the hardbarrier does, and that's what we're
> trying to get rid of.  For XFS we prevent this by something that is
> called the busy extent list - extents delete by a transaction are
> inserted into it (it's actually a rbtree not a list these days),
> and before we can reuse blocks from it we need to ensure that it
> is fully commited.  discards only happen off that list and extents
> are only removed from it once the discard has finished.  I assume
> other filesystems have a similar mechanism.

So ext4 does the transaction commit (which guarantees that the file
delete has hit the disk platterns), and *then* issues the discard, and
*then* we zap the busy extent list.  That's the only safe thing to do,
since if we crash before the transaction gets committed, we lose the
data blocks, so I can't issue the discard until after I wait for
commit block to finish.  This should be the case regardless of
anything we change with respect to how the discard operation works,
since if we discard and then crash before the commit block is written,
data blocks will get lost that should not be discarded.  Am I missing
something?

So after these ordering flush/ordering change that have been proposed,
if the block device layer is free to reorder the discard and a
subsequent write to a discard block, I will need to add a *new* wait
for the discard to complete before I can free the busy extent list.
And this will be true for all file systems that are currently issuing
discards.  Again, am I missing something?

This implies that if the changes to allow the reordering of the
discard and the subsequent writes to the discard blocks goes in
*before* we update all of the filesystems, then there is the potential
for data loss.  And while most file systems don't do discuards by
default, but require some mount option, this still might be considered
undesirable.

So that means we need to add the end-io callbacks to the discard
operations *first*, before we remove the implicit flush/ordering
guarantees.

I thought you were saying that it should be safe to remove the
flush/ordering guarantees in your earlier messages, but this is
leaving me quite confused.  Did I misunderstand you?

					- Ted

  parent reply	other threads:[~2010-08-15 17:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-14 11:56 discard and barriers Christoph Hellwig
2010-08-14 14:14 ` Ted Ts'o
2010-08-14 14:52   ` Christoph Hellwig
2010-08-14 15:46     ` Chris Mason
2010-08-14 17:22       ` Christoph Hellwig
2010-08-14 20:11       ` Hugh Dickins
2010-08-15 17:39     ` Ted Ts'o [this message]
2010-08-15 19:02       ` Christoph Hellwig
2010-08-15 21:25         ` Ted Ts'o
2010-08-15 21:30           ` Christoph Hellwig
2010-08-16  9:41     ` Steven Whitehouse
2010-08-16 11:26       ` Christoph Hellwig
2010-08-17 10:59         ` Steven Whitehouse
2010-08-23 16:42 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100815173906.GA20124@thunk.org \
    --to=tytso@mit.edu \
    --cc=chris.mason@oracle.com \
    --cc=hch@lst.de \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=hughd@google.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).