From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Hellwig Subject: Re: discard and barriers Date: Sat, 14 Aug 2010 16:52:10 +0200 Message-ID: <20100814145210.GA23126@lst.de> References: <20100814115625.GA15902@lst.de> <20100814141451.GB14960@thunk.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Christoph Hellwig , hughd@google.com, hirofumi@mail.parknet.co.jp, chris.mason@oracle.com, swhiteho@redhat.com, linux-fsdevel@vger.kernel.org, jaxboe@fusionio.com, martin.petersen@oracle.com To: "Ted Ts'o" Return-path: Received: from verein.lst.de ([213.95.11.210]:52635 "EHLO verein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756708Ab0HNOwl (ORCPT ); Sat, 14 Aug 2010 10:52:41 -0400 Content-Disposition: inline In-Reply-To: <20100814141451.GB14960@thunk.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > Also, to be clear, the block layer will guarantee that a trim/discard > of block 12345 will not be reordered with respect to a write block > 12345, correct? Right now that is what the hardbarrier does, and that's what we're trying to get rid of. For XFS we prevent this by something that is called the busy extent list - extents delete by a transaction are inserted into it (it's actually a rbtree not a list these days), and before we can reuse blocks from it we need to ensure that it is fully commited. discards only happen off that list and extents are only removed from it once the discard has finished. I assume other filesystems have a similar mechanism. > And on SATA devices, where discard requests are not queued requests, > the ata layer will have to do a queue flush *before* the discard is > sent, right? Yes. > But things should be a tiny bit better even with SATA > because we won't need to wait for the barrier to be acknowledged > before sending more write requests to the device. If I understand > things correctly, the main place where this will have benefit will be > for more advanced interfaces like SAS? The performance improvement is primarily interesting for Fibre or iSCSI attach arrays with thin provisioning support. I've not seen a TP-capable SAS device yet. The other motivation is that this is the last piece that relies on the ordering semantics of barriers, which we're trying to get rid of.