* discard and barriers @ 2010-08-14 11:56 Christoph Hellwig 2010-08-14 14:14 ` Ted Ts'o 2010-08-23 16:42 ` Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: Christoph Hellwig @ 2010-08-14 11:56 UTC (permalink / raw) To: hughd, hirofumi, tytso, chris.mason, swhiteho Cc: linux-fsdevel, jaxboe, martin.petersen Currently all filesystems submit their discard requests as BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER type requests. That is they both wait for completion synchronously, and submit them as a barrier. For those not part of the current barrier discussion the barrier flag has two implications: (a) it prevents reordering of the request with any previous or later one (b) it causes a cache flush before the request and ensures the request is on disk when it returns, which at least for typical SATA requests means another flush request as we don't able the FUA bit (which isn't applicable to TRIM or UNMAP anyway) (a) is something we are planning to get rid of in the block layer completely, so we'll need to figure out a way how to deal with it for discards. (b) doesn't actually seem to be nessecary for discards from my research - given that discards are an optimization for the storage device we don't care if it actually hits the disk or not in case of a crash. And given that the definition of SYNCHRONIZE_CACHE only deals with data it's not even defined that it affects discard commands. Can you guys review you rely on a) in your filesystem and if yes help me to figure out a good way to replace it? All the callers look like they do not actually rely on it because they seem to wait for the actual block freeing metadata I/O beforehand, but I'm not sure. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 11:56 discard and barriers Christoph Hellwig @ 2010-08-14 14:14 ` Ted Ts'o 2010-08-14 14:52 ` Christoph Hellwig 2010-08-23 16:42 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Ted Ts'o @ 2010-08-14 14:14 UTC (permalink / raw) To: Christoph Hellwig Cc: hughd, hirofumi, chris.mason, swhiteho, linux-fsdevel, jaxboe, martin.petersen On Sat, Aug 14, 2010 at 01:56:25PM +0200, Christoph Hellwig wrote: > Can you guys review you rely on a) in your filesystem and if yes help > me to figure out a good way to replace it? All the callers look like > they do not actually rely on it because they seem to wait for the > actual block freeing metadata I/O beforehand, but I'm not sure. In no journal mode we don't actually wait, but nothing is guaranteed in w/o a journal mode with respect to what happens after a crash, so I think we're good. For all file systems that use journalling, we have to make sure transaction which freed the blocks is safely committed on the disk platters before we can call trim, so what you're asking should be guaranteed if the file system is otherwise correct wrt to journalling --- or am I missing something? Also, to be clear, the block layer will guarantee that a trim/discard of block 12345 will not be reordered with respect to a write block 12345, correct? And on SATA devices, where discard requests are not queued requests, the ata layer will have to do a queue flush *before* the discard is sent, right? But things should be a tiny bit better even with SATA because we won't need to wait for the barrier to be acknowledged before sending more write requests to the device. If I understand things correctly, the main place where this will have benefit will be for more advanced interfaces like SAS? - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 14:14 ` Ted Ts'o @ 2010-08-14 14:52 ` Christoph Hellwig 2010-08-14 15:46 ` Chris Mason ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Christoph Hellwig @ 2010-08-14 14:52 UTC (permalink / raw) To: Ted Ts'o Cc: Christoph Hellwig, hughd, hirofumi, chris.mason, swhiteho, linux-fsdevel, jaxboe, martin.petersen On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > Also, to be clear, the block layer will guarantee that a trim/discard > of block 12345 will not be reordered with respect to a write block > 12345, correct? Right now that is what the hardbarrier does, and that's what we're trying to get rid of. For XFS we prevent this by something that is called the busy extent list - extents delete by a transaction are inserted into it (it's actually a rbtree not a list these days), and before we can reuse blocks from it we need to ensure that it is fully commited. discards only happen off that list and extents are only removed from it once the discard has finished. I assume other filesystems have a similar mechanism. > And on SATA devices, where discard requests are not queued requests, > the ata layer will have to do a queue flush *before* the discard is > sent, right? Yes. > But things should be a tiny bit better even with SATA > because we won't need to wait for the barrier to be acknowledged > before sending more write requests to the device. If I understand > things correctly, the main place where this will have benefit will be > for more advanced interfaces like SAS? The performance improvement is primarily interesting for Fibre or iSCSI attach arrays with thin provisioning support. I've not seen a TP-capable SAS device yet. The other motivation is that this is the last piece that relies on the ordering semantics of barriers, which we're trying to get rid of. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 14:52 ` Christoph Hellwig @ 2010-08-14 15:46 ` Chris Mason 2010-08-14 17:22 ` Christoph Hellwig 2010-08-14 20:11 ` Hugh Dickins 2010-08-15 17:39 ` Ted Ts'o 2010-08-16 9:41 ` Steven Whitehouse 2 siblings, 2 replies; 14+ messages in thread From: Chris Mason @ 2010-08-14 15:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Ted Ts'o, hughd, hirofumi, swhiteho, linux-fsdevel, jaxboe, martin.petersen On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote: > On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > > Also, to be clear, the block layer will guarantee that a trim/discard > > of block 12345 will not be reordered with respect to a write block > > 12345, correct? > > Right now that is what the hardbarrier does, and that's what we're > trying to get rid of. So btrfs will wait_on_{page/buffer/bio} to meet all ordering requirements. This holds both for transaction commit and for discard. Reiserfs has the exception you already know about. > For XFS we prevent this by something that is > called the busy extent list - extents delete by a transaction are > inserted into it (it's actually a rbtree not a list these days), > and before we can reuse blocks from it we need to ensure that it > is fully commited. discards only happen off that list and extents > are only removed from it once the discard has finished. I assume > other filesystems have a similar mechanism. > > > And on SATA devices, where discard requests are not queued requests, > > the ata layer will have to do a queue flush *before* the discard is > > sent, right? Another way to say this is we have to be 100% sure that if we write something after a discard, that storage will do that write after it does the discard. I'm not actually worried about writes before the discard, because the worst case for us is the drive fails to discard something it could have (this is the drive's problem). Cache flushes from the FS will cover the case where transaction commits depend on the data going in before the discard. I care a lot about the write after the discards though. If the discards themselves become async, that's ok too as long as we have some way to do end_io processing on them. -chris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 15:46 ` Chris Mason @ 2010-08-14 17:22 ` Christoph Hellwig 2010-08-14 20:11 ` Hugh Dickins 1 sibling, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2010-08-14 17:22 UTC (permalink / raw) To: Chris Mason, Christoph Hellwig, Ted Ts'o, hughd, hirofumi, swhiteho, linux-fsdevel On Sat, Aug 14, 2010 at 11:46:36AM -0400, Chris Mason wrote: > Another way to say this is we have to be 100% sure that if we write > something after a discard, that storage will do that write after it does > the discard. Once we don't have barriers anymore the only way to do that is to wait for the discard to finish before submitting that I/O. > I care a lot about the write after the discards though. If the discards > themselves become async, that's ok too as long as we have some way to do > end_io processing on them. You can just submit the discard bio yourself. That's what I do in the current XFS code. It's a bit awkward that you have to do all the size checking yourself currently, but we could make that a common helper, I just don't want to do that now as it would create all kinds of dependencies for merging the trees in the .37 window. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 15:46 ` Chris Mason 2010-08-14 17:22 ` Christoph Hellwig @ 2010-08-14 20:11 ` Hugh Dickins 1 sibling, 0 replies; 14+ messages in thread From: Hugh Dickins @ 2010-08-14 20:11 UTC (permalink / raw) To: Chris Mason, Christoph Hellwig, Ted Ts'o, hughd, hirofumi, swhiteho, linux-fsdevel On Sat, 14 Aug 2010, Chris Mason wrote: > On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote: > > On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > > > Also, to be clear, the block layer will guarantee that a trim/discard > > > of block 12345 will not be reordered with respect to a write block > > > 12345, correct? > > > > Right now that is what the hardbarrier does, and that's what we're > > trying to get rid of. > > So btrfs will wait_on_{page/buffer/bio} to meet all ordering > requirements. This holds both for transaction commit and for discard. > Reiserfs has the exception you already know about. > > > For XFS we prevent this by something that is > > called the busy extent list - extents delete by a transaction are > > inserted into it (it's actually a rbtree not a list these days), > > and before we can reuse blocks from it we need to ensure that it > > is fully commited. discards only happen off that list and extents > > are only removed from it once the discard has finished. I assume > > other filesystems have a similar mechanism. Yes, whatever works for XFS and for btrfs will be enough for swap: all it needs is to wait on completion of the discard, just as you enforced with BLKDEV_IFL_WAIT, before issuing more writes to the discarded area - as it already does. > > > > > And on SATA devices, where discard requests are not queued requests, > > > the ata layer will have to do a queue flush *before* the discard is > > > sent, right? > > Another way to say this is we have to be 100% sure that if we write > something after a discard, that storage will do that write after it does > the discard. > > I'm not actually worried about writes before the discard, because the > worst case for us is the drive fails to discard something it could have > (this is the drive's problem). Cache flushes from the FS will cover the > case where transaction commits depend on the data going in before the > discard. That is a great point. Swap does not need nor want a queue flush before discard: all that achieves is interfere with the flow to other partitions. Can we reason that that queue flush cannot be necessary in any case - that anything which appears to need it for correctness must actually be already doing serialization that makes it superfluous? > > I care a lot about the write after the discards though. If the discards > themselves become async, that's ok too as long as we have some way to do > end_io processing on them. Yes, same for swap. Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 14:52 ` Christoph Hellwig 2010-08-14 15:46 ` Chris Mason @ 2010-08-15 17:39 ` Ted Ts'o 2010-08-15 19:02 ` Christoph Hellwig 2010-08-16 9:41 ` Steven Whitehouse 2 siblings, 1 reply; 14+ messages in thread From: Ted Ts'o @ 2010-08-15 17:39 UTC (permalink / raw) To: Christoph Hellwig Cc: hughd, hirofumi, chris.mason, swhiteho, linux-fsdevel, jaxboe, martin.petersen On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote: > On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > > Also, to be clear, the block layer will guarantee that a trim/discard > > of block 12345 will not be reordered with respect to a write block > > 12345, correct? > > Right now that is what the hardbarrier does, and that's what we're > trying to get rid of. For XFS we prevent this by something that is > called the busy extent list - extents delete by a transaction are > inserted into it (it's actually a rbtree not a list these days), > and before we can reuse blocks from it we need to ensure that it > is fully commited. discards only happen off that list and extents > are only removed from it once the discard has finished. I assume > other filesystems have a similar mechanism. So ext4 does the transaction commit (which guarantees that the file delete has hit the disk platterns), and *then* issues the discard, and *then* we zap the busy extent list. That's the only safe thing to do, since if we crash before the transaction gets committed, we lose the data blocks, so I can't issue the discard until after I wait for commit block to finish. This should be the case regardless of anything we change with respect to how the discard operation works, since if we discard and then crash before the commit block is written, data blocks will get lost that should not be discarded. Am I missing something? So after these ordering flush/ordering change that have been proposed, if the block device layer is free to reorder the discard and a subsequent write to a discard block, I will need to add a *new* wait for the discard to complete before I can free the busy extent list. And this will be true for all file systems that are currently issuing discards. Again, am I missing something? This implies that if the changes to allow the reordering of the discard and the subsequent writes to the discard blocks goes in *before* we update all of the filesystems, then there is the potential for data loss. And while most file systems don't do discuards by default, but require some mount option, this still might be considered undesirable. So that means we need to add the end-io callbacks to the discard operations *first*, before we remove the implicit flush/ordering guarantees. I thought you were saying that it should be safe to remove the flush/ordering guarantees in your earlier messages, but this is leaving me quite confused. Did I misunderstand you? - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-15 17:39 ` Ted Ts'o @ 2010-08-15 19:02 ` Christoph Hellwig 2010-08-15 21:25 ` Ted Ts'o 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2010-08-15 19:02 UTC (permalink / raw) To: Ted Ts'o Cc: Christoph Hellwig, hughd, hirofumi, chris.mason, swhiteho, linux-fsdevel, jaxboe, martin.petersen On Sun, Aug 15, 2010 at 01:39:06PM -0400, Ted Ts'o wrote: > So after these ordering flush/ordering change that have been proposed, > if the block device layer is free to reorder the discard and a > subsequent write to a discard block, I will need to add a *new* wait > for the discard to complete before I can free the busy extent list. > And this will be true for all file systems that are currently issuing > discards. Again, am I missing something? The above is correct, except for the *new* part. sb_issue_discard at the moment is synchronous, so you're already waiting for it to finish. > So that means we need to add the end-io callbacks to the discard > operations *first*, before we remove the implicit flush/ordering > guarantees. Doing the discard asynchronous and with and end_io callback defintivel is an optimization over waiting for it synchronously, and it's in fact what I'm doing in XFS. It's however unrelated to getting rid of the barriers. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-15 19:02 ` Christoph Hellwig @ 2010-08-15 21:25 ` Ted Ts'o 2010-08-15 21:30 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Ted Ts'o @ 2010-08-15 21:25 UTC (permalink / raw) To: Christoph Hellwig Cc: hughd, hirofumi, chris.mason, swhiteho, linux-fsdevel, jaxboe, martin.petersen On Sun, Aug 15, 2010 at 09:02:30PM +0200, Christoph Hellwig wrote: > On Sun, Aug 15, 2010 at 01:39:06PM -0400, Ted Ts'o wrote: > > So after these ordering flush/ordering change that have been proposed, > > if the block device layer is free to reorder the discard and a > > subsequent write to a discard block, I will need to add a *new* wait > > for the discard to complete before I can free the busy extent list. > > And this will be true for all file systems that are currently issuing > > discards. Again, am I missing something? > > The above is correct, except for the *new* part. sb_issue_discard at > the moment is synchronous, so you're already waiting for it to finish. OK, now I understand why I'm confused. I thought the proposal was to change sb_issue_discard() to make it be asynchronous? Really, what we're talking about here is eliminating the explicit barrier/SYNCHRONIZE CACHE from the discard, correct? The sb_issue_discard() call will still remain synchronous. - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-15 21:25 ` Ted Ts'o @ 2010-08-15 21:30 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2010-08-15 21:30 UTC (permalink / raw) To: Ted Ts'o Cc: Christoph Hellwig, hughd, hirofumi, chris.mason, swhiteho, linux-fsdevel, jaxboe, martin.petersen On Sun, Aug 15, 2010 at 05:25:34PM -0400, Ted Ts'o wrote: > OK, now I understand why I'm confused. I thought the proposal was to > change sb_issue_discard() to make it be asynchronous? Really, what > we're talking about here is eliminating the explicit > barrier/SYNCHRONIZE CACHE from the discard, correct? The > sb_issue_discard() call will still remain synchronous. Yes, at least for now. I don't think keeping it that way over the long run is a good idea, but for now getting rid of the barrier is all that *needs* to be done. The rest is optimizations that can be done later. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 14:52 ` Christoph Hellwig 2010-08-14 15:46 ` Chris Mason 2010-08-15 17:39 ` Ted Ts'o @ 2010-08-16 9:41 ` Steven Whitehouse 2010-08-16 11:26 ` Christoph Hellwig 2 siblings, 1 reply; 14+ messages in thread From: Steven Whitehouse @ 2010-08-16 9:41 UTC (permalink / raw) To: Christoph Hellwig Cc: Ted Ts'o, hughd, hirofumi, chris.mason, linux-fsdevel, jaxboe, martin.petersen Hi, On Sat, 2010-08-14 at 16:52 +0200, Christoph Hellwig wrote: > On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > > Also, to be clear, the block layer will guarantee that a trim/discard > > of block 12345 will not be reordered with respect to a write block > > 12345, correct? > > Right now that is what the hardbarrier does, and that's what we're > trying to get rid of. For XFS we prevent this by something that is > called the busy extent list - extents delete by a transaction are > inserted into it (it's actually a rbtree not a list these days), > and before we can reuse blocks from it we need to ensure that it > is fully commited. discards only happen off that list and extents > are only removed from it once the discard has finished. I assume > other filesystems have a similar mechanism. > GFS2 has a similar concept, which compares two bit maps to generate the extent list to generate the discards. This is done after each resource group has been committed to the journal, and just before the resource group bitmap is updated with the newly freed blocks (and marked dirty). Any remote node wanting to use that new space will cause a further journal flush when it requests the resource group lock (as well as in place write back of that resource group, of course). If the local node wants to reuse the recently freed space, then that can happen as soon as the log commit has finished, so in this case the barrier and the waiting are required. At the moment it seems to be doing that on every request, however there is no reason why we couldn't move the barrier to the end of the log flush code and have one per log flush conditional upon a discard having been issued (or some equivalent construct bearing in mind the objective of removing barriers), Steve. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-16 9:41 ` Steven Whitehouse @ 2010-08-16 11:26 ` Christoph Hellwig 2010-08-17 10:59 ` Steven Whitehouse 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2010-08-16 11:26 UTC (permalink / raw) To: Steven Whitehouse Cc: Christoph Hellwig, Ted Ts'o, hughd, hirofumi, chris.mason, linux-fsdevel, jaxboe, martin.petersen On Mon, Aug 16, 2010 at 10:41:51AM +0100, Steven Whitehouse wrote: > GFS2 has a similar concept, which compares two bit maps to generate the > extent list to generate the discards. This is done after each resource > group has been committed to the journal, and just before the resource > group bitmap is updated with the newly freed blocks (and marked dirty). > > Any remote node wanting to use that new space will cause a further > journal flush when it requests the resource group lock (as well as in > place write back of that resource group, of course). > > If the local node wants to reuse the recently freed space, then that can > happen as soon as the log commit has finished, so in this case the > barrier and the waiting are required. I don't think you need the barrier for that. The wait means the discard has finished, and from that point writes to the blocks discarded are safe. There's no need to flush the volatile write cache after a discard either. The big question is if you need the drain before the discard. Given that you did a log commit before I suspect not as the log commit waits on all I/Os related to this commit. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-16 11:26 ` Christoph Hellwig @ 2010-08-17 10:59 ` Steven Whitehouse 0 siblings, 0 replies; 14+ messages in thread From: Steven Whitehouse @ 2010-08-17 10:59 UTC (permalink / raw) To: Christoph Hellwig Cc: Ted Ts'o, hughd, hirofumi, chris.mason, linux-fsdevel, jaxboe, martin.petersen Hi, On Mon, 2010-08-16 at 13:26 +0200, Christoph Hellwig wrote: > On Mon, Aug 16, 2010 at 10:41:51AM +0100, Steven Whitehouse wrote: > > GFS2 has a similar concept, which compares two bit maps to generate the > > extent list to generate the discards. This is done after each resource > > group has been committed to the journal, and just before the resource > > group bitmap is updated with the newly freed blocks (and marked dirty). > > > > Any remote node wanting to use that new space will cause a further > > journal flush when it requests the resource group lock (as well as in > > place write back of that resource group, of course). > > > > If the local node wants to reuse the recently freed space, then that can > > happen as soon as the log commit has finished, so in this case the > > barrier and the waiting are required. > > I don't think you need the barrier for that. The wait means the > discard has finished, and from that point writes to the blocks discarded > are safe. There's no need to flush the volatile write cache after > a discard either. The big question is if you need the drain before > the discard. Given that you did a log commit before I suspect not > as the log commit waits on all I/Os related to this commit. > Yes, that sounds reasonable in that case. I didn't know there were reordering guarantees with discards vs normal I/O so, in that case there is no need for the barrier. I don't think anything more is required in this case, so we should be good to go, Steve. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: discard and barriers 2010-08-14 11:56 discard and barriers Christoph Hellwig 2010-08-14 14:14 ` Ted Ts'o @ 2010-08-23 16:42 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2010-08-23 16:42 UTC (permalink / raw) To: jaxboe Cc: hughd, hirofumi, tytso, chris.mason, swhiteho, martin.petersen, linux-fsdevel Jens, do you think the following patch is still okay for 3.6.36? It fixes the massive performance regression due to the full barriers for discard and also makes the barrier removal in 2.6.37 a lot simpler. --- From: Christoph Hellwig <hch@lst.de> Subject: [PATCH] block: do not send discards as barriers There is no reason to send discards as barriers. The rationale for that is easy: The current barriers do two things, flushing caches and provide ordering. There is no reason for a cache flush before a discard because the discard doesn't care for ordering vs writes to other regions than the one it discards, and there is no reason for a cache flush afterwards as a discard doesn't store data. The ordering semantics aren't used currently because all discards are done synchronously and thus we order explicitly. Even more until very late in the 2.6.35 cycle we didn't send DISCARD_BARRIER requests as real hardbarrier but as an elevator only barrier which doesn't provide the above semantics, and the switch to real barriers causes masssive performance regressions especially for the swap code. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/block/blk-lib.c =================================================================== --- linux-2.6.orig/block/blk-lib.c 2010-08-23 15:16:37.656033352 +0200 +++ linux-2.6/block/blk-lib.c 2010-08-23 15:16:45.913012470 +0200 @@ -39,8 +39,7 @@ int blkdev_issue_discard(struct block_de { DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); - int type = flags & BLKDEV_IFL_BARRIER ? - DISCARD_BARRIER : DISCARD_NOBARRIER; + int type = REQ_WRITE | REQ_DISCARD; unsigned int max_discard_sectors; struct bio *bio; int ret = 0; @@ -65,7 +64,7 @@ int blkdev_issue_discard(struct block_de if (flags & BLKDEV_IFL_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; - type |= DISCARD_SECURE; + type |= REQ_SECURE; } while (nr_sects && !ret) { @@ -162,12 +161,6 @@ int blkdev_issue_zeroout(struct block_de bb.wait = &wait; bb.end_io = NULL; - if (flags & BLKDEV_IFL_BARRIER) { - /* issue async barrier before the data */ - ret = blkdev_issue_flush(bdev, gfp_mask, NULL, 0); - if (ret) - return ret; - } submit: ret = 0; while (nr_sects != 0) { @@ -199,13 +192,6 @@ submit: issued++; submit_bio(WRITE, bio); } - /* - * When all data bios are in flight. Send final barrier if requeted. - */ - if (nr_sects == 0 && flags & BLKDEV_IFL_BARRIER) - ret = blkdev_issue_flush(bdev, gfp_mask, NULL, - flags & BLKDEV_IFL_WAIT); - if (flags & BLKDEV_IFL_WAIT) /* Wait for bios in-flight */ Index: linux-2.6/fs/btrfs/extent-tree.c =================================================================== --- linux-2.6.orig/fs/btrfs/extent-tree.c 2010-08-23 15:16:37.677004089 +0200 +++ linux-2.6/fs/btrfs/extent-tree.c 2010-08-23 15:16:45.918004368 +0200 @@ -1696,7 +1696,7 @@ static void btrfs_issue_discard(struct b u64 start, u64 len) { blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL, - BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); } static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr, Index: linux-2.6/fs/gfs2/rgrp.c =================================================================== --- linux-2.6.orig/fs/gfs2/rgrp.c 2010-08-23 15:16:37.684004298 +0200 +++ linux-2.6/fs/gfs2/rgrp.c 2010-08-23 15:16:45.927004298 +0200 @@ -854,8 +854,7 @@ static void gfs2_rgrp_send_discards(stru if ((start + nr_sects) != blk) { rv = blkdev_issue_discard(bdev, start, nr_sects, GFP_NOFS, - BLKDEV_IFL_WAIT | - BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); if (rv) goto fail; nr_sects = 0; @@ -870,7 +869,7 @@ start_new_extent: } if (nr_sects) { rv = blkdev_issue_discard(bdev, start, nr_sects, GFP_NOFS, - BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); if (rv) goto fail; } Index: linux-2.6/fs/nilfs2/the_nilfs.c =================================================================== --- linux-2.6.orig/fs/nilfs2/the_nilfs.c 2010-08-23 15:16:37.692003949 +0200 +++ linux-2.6/fs/nilfs2/the_nilfs.c 2010-08-23 15:16:45.928014774 +0200 @@ -775,8 +775,7 @@ int nilfs_discard_segments(struct the_ni start * sects_per_block, nblocks * sects_per_block, GFP_NOFS, - BLKDEV_IFL_WAIT | - BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); if (ret < 0) return ret; nblocks = 0; @@ -787,7 +786,7 @@ int nilfs_discard_segments(struct the_ni start * sects_per_block, nblocks * sects_per_block, GFP_NOFS, - BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); return ret; } Index: linux-2.6/include/linux/blkdev.h =================================================================== --- linux-2.6.orig/include/linux/blkdev.h 2010-08-23 15:16:37.700013168 +0200 +++ linux-2.6/include/linux/blkdev.h 2010-08-23 15:16:45.931003739 +0200 @@ -921,11 +921,9 @@ static inline struct request *blk_map_qu } enum{ BLKDEV_WAIT, /* wait for completion */ - BLKDEV_BARRIER, /* issue request with barrier */ BLKDEV_SECURE, /* secure discard */ }; #define BLKDEV_IFL_WAIT (1 << BLKDEV_WAIT) -#define BLKDEV_IFL_BARRIER (1 << BLKDEV_BARRIER) #define BLKDEV_IFL_SECURE (1 << BLKDEV_SECURE) extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *, unsigned long); @@ -939,7 +937,7 @@ static inline int sb_issue_discard(struc block <<= (sb->s_blocksize_bits - 9); nr_blocks <<= (sb->s_blocksize_bits - 9); return blkdev_issue_discard(sb->s_bdev, block, nr_blocks, GFP_NOFS, - BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); } extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm); Index: linux-2.6/include/linux/fs.h =================================================================== --- linux-2.6.orig/include/linux/fs.h 2010-08-23 15:16:37.709012819 +0200 +++ linux-2.6/include/linux/fs.h 2010-08-23 15:16:45.938005835 +0200 @@ -159,14 +159,6 @@ struct inodes_stat_t { #define WRITE_BARRIER (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_UNPLUG | \ REQ_HARDBARRIER) -/* - * These aren't really reads or writes, they pass down information about - * parts of device that are now unused by the file system. - */ -#define DISCARD_NOBARRIER (WRITE | REQ_DISCARD) -#define DISCARD_BARRIER (WRITE | REQ_DISCARD | REQ_HARDBARRIER) -#define DISCARD_SECURE (DISCARD_NOBARRIER | REQ_SECURE) - #define SEL_IN 1 #define SEL_OUT 2 #define SEL_EX 4 Index: linux-2.6/mm/swapfile.c =================================================================== --- linux-2.6.orig/mm/swapfile.c 2010-08-23 15:16:37.724015124 +0200 +++ linux-2.6/mm/swapfile.c 2010-08-23 15:16:45.948004996 +0200 @@ -142,7 +142,7 @@ static int discard_swap(struct swap_info if (nr_blocks) { err = blkdev_issue_discard(si->bdev, start_block, nr_blocks, GFP_KERNEL, - BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); if (err) return err; cond_resched(); @@ -154,7 +154,7 @@ static int discard_swap(struct swap_info err = blkdev_issue_discard(si->bdev, start_block, nr_blocks, GFP_KERNEL, - BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER); + BLKDEV_IFL_WAIT); if (err) break; @@ -193,8 +193,7 @@ static void discard_swap_cluster(struct start_block <<= PAGE_SHIFT - 9; nr_blocks <<= PAGE_SHIFT - 9; if (blkdev_issue_discard(si->bdev, start_block, - nr_blocks, GFP_NOIO, BLKDEV_IFL_WAIT | - BLKDEV_IFL_BARRIER)) + nr_blocks, GFP_NOIO, BLKDEV_IFL_WAIT)) break; } ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-08-23 16:42 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-14 11:56 discard and barriers Christoph Hellwig 2010-08-14 14:14 ` Ted Ts'o 2010-08-14 14:52 ` Christoph Hellwig 2010-08-14 15:46 ` Chris Mason 2010-08-14 17:22 ` Christoph Hellwig 2010-08-14 20:11 ` Hugh Dickins 2010-08-15 17:39 ` Ted Ts'o 2010-08-15 19:02 ` Christoph Hellwig 2010-08-15 21:25 ` Ted Ts'o 2010-08-15 21:30 ` Christoph Hellwig 2010-08-16 9:41 ` Steven Whitehouse 2010-08-16 11:26 ` Christoph Hellwig 2010-08-17 10:59 ` Steven Whitehouse 2010-08-23 16:42 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).