linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Chris Mason <chris.mason@oracle.com>
Cc: Matthew Wilcox <willy@linux.intel.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	Ric Wheeler <rwheeler@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: Is TRIM/DISCARD going to be a performance problem?
Date: Mon, 11 May 2009 15:19:57 -0400	[thread overview]
Message-ID: <20090511191957.GD21518@mit.edu> (raw)
In-Reply-To: <1242067995.13509.1.camel@localhost.localdomain>

On Mon, May 11, 2009 at 02:53:15PM -0400, Chris Mason wrote:
> > Actually, that's the exact opposite of what you want.  You want to try
> > to reuse blocks that are scheduled for trimming so that we never have to
> > send the command at all.
> 
> Regardless of the optimal way to reuse blocks, we need some way of
> knowing the discard is done, or at least sent down to the device in such
> a way that any writes will happen after the discard and not before.

An easy way of solving this is simply to have a way for the block
allocator to inform the discard management layer that a particular
block is now in use again.  That will prevent the discard from
happening.  If the discard is in flight, then the interface won't be
able to return until the discard is done.  (This is where real
OS-controlled ordering via dependency --- which NCQ doesn't provide
--- combined with discard/trim as a queuable operation --- would be
really handy.)

One of the things which I worry about is the discard allocation layer
could be an SMP contention point, since the filesystem will need to
call it before every block allocation or deallocation.

Hmm... maybe the better approach is let the filesystem keep the
authoratative list of what's free and not free, and only keep a range
of blocks where some deallocation has taken place.  Then when the
filesystem is quiscent, we can lock out block allocations and scan the
block bitmaps, and then send a trim request for anything that's not in
use in a particular region (i.e. allocation group) of the filesystem.  

After all, quiescing the block I/O queues is what is expensive;
sending a large number of block ranges attached to a single ATA TRIM
command looks cheap by comparison.  So maybe we just lock out the
block group, and send a TRIM for all the unused blocks in that block
group, and only keep track of which block groups should be scanned via
flag in the block group descriptors.  That might be a much simpler
approach.

	  	    	    	  	   	  - Ted


  reply	other threads:[~2009-05-11 19:19 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-09 21:14 Is TRIM/DISCARD going to be a performance problem? Theodore Ts'o
2009-05-10 16:53 ` Jörn Engel
2009-05-11  8:37   ` Theodore Tso
2009-05-11 10:06     ` Jörn Engel
2009-05-11 10:18       ` Jens Axboe
2009-05-11 15:43         ` Jeff Garzik
2009-05-11 11:27       ` Theodore Tso
2009-05-11 12:09         ` Theodore Tso
2009-05-11 13:10           ` Greg Freemyer
2009-05-11 13:39             ` Matthew Wilcox
2009-05-11 14:27             ` Theodore Tso
2009-05-11 14:29               ` Ric Wheeler
2009-05-11 14:50                 ` Theodore Tso
2009-05-11 14:58                   ` Ric Wheeler
2009-05-11 15:00                   ` Matthew Wilcox
2009-05-11 18:47                     ` Greg Freemyer
2009-05-11 19:22                       ` Andreas Dilger
2009-05-11 23:38                       ` Neil Brown
2009-05-12 13:28                         ` Greg Freemyer
2009-05-11 13:15           ` Ric Wheeler
2010-04-24 17:11           ` Phillip Susi
2009-05-11 12:43         ` Jörn Engel
2009-05-11 12:48           ` Matthew Wilcox
     [not found]             ` <f3177b9e0905111433i40e41c90r920d7ccf36442ffd@mail.gmail.com>
2009-05-11 22:03               ` Chris Worley
2009-05-11 16:30       ` Chris Worley
2009-05-11  8:12 ` Jens Axboe
2009-05-11  8:41   ` Theodore Tso
2009-05-11  8:49     ` Jens Axboe
2009-05-11 17:18     ` Chris Mason
2009-05-11 18:43       ` Matthew Wilcox
2009-05-11 18:53         ` Chris Mason
2009-05-11 19:19           ` Theodore Tso [this message]
2009-05-29 10:52         ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090511191957.GD21518@mit.edu \
    --to=tytso@mit.edu \
    --cc=chris.mason@oracle.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rwheeler@redhat.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).