[PATCH] block: fix DISCARD

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] block: fix DISCARD_BARRIER requests
@ 2010-06-17  7:54 Christoph Hellwig
  2010-06-17  8:10 ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2010-06-17  7:54 UTC (permalink / raw)
  To: axboe; +Cc: linux-fsdevel

Filesystems assume that DISCARD_BARRIER are full barriers, so that they
don't have to track in-progress discard operation when submitting new I/O.
But currently we only treat them as elevator barriers, which don't
actually do the nessecary queue drains.

Also remove the unlikely around both the DISCARD and BARRIER requests -
the happen far too often for a static mispredict.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/block/blk-core.c
===================================================================
--- linux-2.6.orig/block/blk-core.c	2010-06-17 09:42:23.924004448 +0200
+++ linux-2.6/block/blk-core.c	2010-06-17 09:45:46.727004030 +0200
@@ -1149,13 +1149,10 @@ void init_request_from_bio(struct reques
 	else
 		req->cmd_flags |= bio->bi_rw & REQ_FAILFAST_MASK;
 
-	if (unlikely(bio_rw_flagged(bio, BIO_RW_DISCARD))) {
+	if (bio_rw_flagged(bio, BIO_RW_DISCARD))
 		req->cmd_flags |= REQ_DISCARD;
-		if (bio_rw_flagged(bio, BIO_RW_BARRIER))
-			req->cmd_flags |= REQ_SOFTBARRIER;
-	} else if (unlikely(bio_rw_flagged(bio, BIO_RW_BARRIER)))
+	if (bio_rw_flagged(bio, BIO_RW_BARRIER))
 		req->cmd_flags |= REQ_HARDBARRIER;
-
 	if (bio_rw_flagged(bio, BIO_RW_SYNCIO))
 		req->cmd_flags |= REQ_RW_SYNC;
 	if (bio_rw_flagged(bio, BIO_RW_META))

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-17  7:54 [PATCH] block: fix DISCARD_BARRIER requests Christoph Hellwig
@ 2010-06-17  8:10 ` Jens Axboe
  2010-06-17 11:49   ` Matthew Wilcox
  2010-06-17 16:54   ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: Jens Axboe @ 2010-06-17  8:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, Chris Mason

On 2010-06-17 09:54, Christoph Hellwig wrote:
> Filesystems assume that DISCARD_BARRIER are full barriers, so that they
> don't have to track in-progress discard operation when submitting new I/O.
> But currently we only treat them as elevator barriers, which don't
> actually do the nessecary queue drains.
> 
> Also remove the unlikely around both the DISCARD and BARRIER requests -
> the happen far too often for a static mispredict.

Thanks, applied. There was a recent problem report on btrfs using
discard, could possibly explain it if Chris assumed it was a full
barrier.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-17  8:10 ` Jens Axboe
@ 2010-06-17 11:49   ` Matthew Wilcox
  2010-06-17 12:06     ` Jens Axboe
  2010-06-17 16:54   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2010-06-17 11:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel, Chris Mason

On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
> On 2010-06-17 09:54, Christoph Hellwig wrote:
> > Filesystems assume that DISCARD_BARRIER are full barriers, so that they
> > don't have to track in-progress discard operation when submitting new I/O.
> > But currently we only treat them as elevator barriers, which don't
> > actually do the nessecary queue drains.
> > 
> > Also remove the unlikely around both the DISCARD and BARRIER requests -
> > the happen far too often for a static mispredict.
> 
> Thanks, applied. There was a recent problem report on btrfs using
> discard, could possibly explain it if Chris assumed it was a full
> barrier.

If it was on real hardware (ie a SATA disc), it can't be this problem,
since TRIM isn't NCQ so there was already an implicit queue drain ahead
and behind the TRIM.  I hear rumours of an NCQ TRIM coming 'RSN', but
haven't seen any details on it yet.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-17 11:49   ` Matthew Wilcox
@ 2010-06-17 12:06     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2010-06-17 12:06 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Christoph Hellwig, linux-fsdevel, Chris Mason

On 2010-06-17 13:49, Matthew Wilcox wrote:
> On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
>> On 2010-06-17 09:54, Christoph Hellwig wrote:
>>> Filesystems assume that DISCARD_BARRIER are full barriers, so that they
>>> don't have to track in-progress discard operation when submitting new I/O.
>>> But currently we only treat them as elevator barriers, which don't
>>> actually do the nessecary queue drains.
>>>
>>> Also remove the unlikely around both the DISCARD and BARRIER requests -
>>> the happen far too often for a static mispredict.
>>
>> Thanks, applied. There was a recent problem report on btrfs using
>> discard, could possibly explain it if Chris assumed it was a full
>> barrier.
> 
> If it was on real hardware (ie a SATA disc), it can't be this problem,
> since TRIM isn't NCQ so there was already an implicit queue drain ahead
> and behind the TRIM.  I hear rumours of an NCQ TRIM coming 'RSN', but
> haven't seen any details on it yet.

Good point, it will drain the queue on SATA of course. The problem was
reported on MMC, I doubt the do queuing at all (though I have not
checked).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-17  8:10 ` Jens Axboe
  2010-06-17 11:49   ` Matthew Wilcox
@ 2010-06-17 16:54   ` Christoph Hellwig
  2010-06-17 19:22     ` Chris Mason
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2010-06-17 16:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel, Chris Mason

On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
> Thanks, applied. There was a recent problem report on btrfs using
> discard, could possibly explain it if Chris assumed it was a full
> barrier.

We actually have a much bigger issue with the DISCARD_BARRIER type.
If the discard request needs to get split into multiple smaller ones
we don't keep the queue drained atomically around them, so requests
could sneak inbetween them.  Depending on how the realtime discard
is implemented that could cause issues.  In my XFS prototype for it
I only deleted the extents from the tracking betree after the discard
request has returned, but other filesystems rely on full barrier
semantics of DISCARD_BARRIER this could cause real problems.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-17 16:54   ` Christoph Hellwig
@ 2010-06-17 19:22     ` Chris Mason
  2010-06-18 13:29       ` Kyungmin Park
  2010-06-18 15:29       ` Jamie Lokier
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Mason @ 2010-06-17 19:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-fsdevel

On Thu, Jun 17, 2010 at 06:54:53PM +0200, Christoph Hellwig wrote:
> On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
> > Thanks, applied. There was a recent problem report on btrfs using
> > discard, could possibly explain it if Chris assumed it was a full
> > barrier.
> 
> We actually have a much bigger issue with the DISCARD_BARRIER type.
> If the discard request needs to get split into multiple smaller ones
> we don't keep the queue drained atomically around them, so requests
> could sneak inbetween them.  Depending on how the realtime discard
> is implemented that could cause issues.  In my XFS prototype for it
> I only deleted the extents from the tracking betree after the discard
> request has returned, but other filesystems rely on full barrier
> semantics of DISCARD_BARRIER this could cause real problems.

btrfs needs to know that a write after the discard returns won't cross
the discard, but beyond that we're happy with anything.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-17 19:22     ` Chris Mason
@ 2010-06-18 13:29       ` Kyungmin Park
  2010-06-19  2:22         ` Kyungmin Park
  2010-06-18 15:29       ` Jamie Lokier
  1 sibling, 1 reply; 10+ messages in thread
From: Kyungmin Park @ 2010-06-18 13:29 UTC (permalink / raw)
  To: Chris Mason, Christoph Hellwig, Jens Axboe, linux-fsdevel

On Fri, Jun 18, 2010 at 4:22 AM, Chris Mason <chris.mason@oracle.com> wrote:
> On Thu, Jun 17, 2010 at 06:54:53PM +0200, Christoph Hellwig wrote:
>> On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
>> > Thanks, applied. There was a recent problem report on btrfs using
>> > discard, could possibly explain it if Chris assumed it was a full
>> > barrier.
>>
>> We actually have a much bigger issue with the DISCARD_BARRIER type.
>> If the discard request needs to get split into multiple smaller ones
>> we don't keep the queue drained atomically around them, so requests
>> could sneak inbetween them.  Depending on how the realtime discard
>> is implemented that could cause issues.  In my XFS prototype for it
>> I only deleted the extents from the tracking betree after the discard
>> request has returned, but other filesystems rely on full barrier
>> semantics of DISCARD_BARRIER this could cause real problems.
>
> btrfs needs to know that a write after the discard returns won't cross
> the discard, but beyond that we're happy with anything.

I tested btrfs on MMC with discard support and get failed. you can
find a scenario and details at btrfs mailing list.
I'll check it with this patch.
Please note that MMC don't have any queue at internal as SSD.

Thank you,
Kyungmin Park
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-17 19:22     ` Chris Mason
  2010-06-18 13:29       ` Kyungmin Park
@ 2010-06-18 15:29       ` Jamie Lokier
  2010-06-18 20:30         ` Chris Mason
  1 sibling, 1 reply; 10+ messages in thread
From: Jamie Lokier @ 2010-06-18 15:29 UTC (permalink / raw)
  To: Chris Mason, Christoph Hellwig, Jens Axboe, linux-fsdevel

Chris Mason wrote:
> On Thu, Jun 17, 2010 at 06:54:53PM +0200, Christoph Hellwig wrote:
> > On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
> > > Thanks, applied. There was a recent problem report on btrfs using
> > > discard, could possibly explain it if Chris assumed it was a full
> > > barrier.
> > 
> > We actually have a much bigger issue with the DISCARD_BARRIER type.
> > If the discard request needs to get split into multiple smaller ones
> > we don't keep the queue drained atomically around them, so requests
> > could sneak inbetween them.  Depending on how the realtime discard
> > is implemented that could cause issues.  In my XFS prototype for it
> > I only deleted the extents from the tracking betree after the discard
> > request has returned, but other filesystems rely on full barrier
> > semantics of DISCARD_BARRIER this could cause real problems.
> 
> btrfs needs to know that a write after the discard returns won't cross
> the discard, but beyond that we're happy with anything.

Is it acceptable for the write to move earlier than a discard that it
doesn't overlap?  In other words, would a range-dependent barrier be
sufficient (hypothetically, for some future elevator / multi-disk
optimisation).

I guess answer to that depends on whether you're queuing a metadata
write to record some fact about the discard which shouldn't reach the
storage until the discard is confirmed done.

-- Jamie

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-18 15:29       ` Jamie Lokier
@ 2010-06-18 20:30         ` Chris Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2010-06-18 20:30 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Christoph Hellwig, Jens Axboe, linux-fsdevel

On Fri, Jun 18, 2010 at 04:29:28PM +0100, Jamie Lokier wrote:
> Chris Mason wrote:
> > On Thu, Jun 17, 2010 at 06:54:53PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
> > > > Thanks, applied. There was a recent problem report on btrfs using
> > > > discard, could possibly explain it if Chris assumed it was a full
> > > > barrier.
> > > 
> > > We actually have a much bigger issue with the DISCARD_BARRIER type.
> > > If the discard request needs to get split into multiple smaller ones
> > > we don't keep the queue drained atomically around them, so requests
> > > could sneak inbetween them.  Depending on how the realtime discard
> > > is implemented that could cause issues.  In my XFS prototype for it
> > > I only deleted the extents from the tracking betree after the discard
> > > request has returned, but other filesystems rely on full barrier
> > > semantics of DISCARD_BARRIER this could cause real problems.
> > 
> > btrfs needs to know that a write after the discard returns won't cross
> > the discard, but beyond that we're happy with anything.
> 
> Is it acceptable for the write to move earlier than a discard that it
> doesn't overlap?  In other words, would a range-dependent barrier be
> sufficient (hypothetically, for some future elevator / multi-disk
> optimisation).
> 
> I guess answer to that depends on whether you're queuing a metadata
> write to record some fact about the discard which shouldn't reach the
> storage until the discard is confirmed done.

It's really just the sector we're discarding that matters.  So if I
discard sector xxyyzz and then write the same sector, please make sure
the discard is done before you put down my new contents ;)

-chris


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix DISCARD_BARRIER requests
  2010-06-18 13:29       ` Kyungmin Park
@ 2010-06-19  2:22         ` Kyungmin Park
  0 siblings, 0 replies; 10+ messages in thread
From: Kyungmin Park @ 2010-06-19  2:22 UTC (permalink / raw)
  To: Chris Mason, Christoph Hellwig, Jens Axboe, linux-fsdevel

On Fri, Jun 18, 2010 at 10:29 PM, Kyungmin Park <kmpark@infradead.org> wrote:
> On Fri, Jun 18, 2010 at 4:22 AM, Chris Mason <chris.mason@oracle.com> wrote:
>> On Thu, Jun 17, 2010 at 06:54:53PM +0200, Christoph Hellwig wrote:
>>> On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
>>> > Thanks, applied. There was a recent problem report on btrfs using
>>> > discard, could possibly explain it if Chris assumed it was a full
>>> > barrier.
>>>
>>> We actually have a much bigger issue with the DISCARD_BARRIER type.
>>> If the discard request needs to get split into multiple smaller ones
>>> we don't keep the queue drained atomically around them, so requests
>>> could sneak inbetween them.  Depending on how the realtime discard
>>> is implemented that could cause issues.  In my XFS prototype for it
>>> I only deleted the extents from the tracking betree after the discard
>>> request has returned, but other filesystems rely on full barrier
>>> semantics of DISCARD_BARRIER this could cause real problems.
>>
>> btrfs needs to know that a write after the discard returns won't cross
>> the discard, but beyond that we're happy with anything.
>
> I tested btrfs on MMC with discard support and get failed. you can
> find a scenario and details at btrfs mailing list.
> I'll check it with this patch.
> Please note that MMC don't have any queue at internal as SSD.
>

It's same it failed to use discard on MMC with btrfs.

Thank you,
Kyungmin Park
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-06-19  2:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-17  7:54 [PATCH] block: fix DISCARD_BARRIER requests Christoph Hellwig
2010-06-17  8:10 ` Jens Axboe
2010-06-17 11:49   ` Matthew Wilcox
2010-06-17 12:06     ` Jens Axboe
2010-06-17 16:54   ` Christoph Hellwig
2010-06-17 19:22     ` Chris Mason
2010-06-18 13:29       ` Kyungmin Park
2010-06-19  2:22         ` Kyungmin Park
2010-06-18 15:29       ` Jamie Lokier
2010-06-18 20:30         ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).