From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH] block: fix DISCARD_BARRIER requests
Date: Fri, 18 Jun 2010 16:30:29 -0400
Message-ID: <20100618203029.GA27466@think>
References: <20100617075432.GA22407@lst.de>
 <4C19D86A.5030709@kernel.dk>
 <20100617165453.GA15824@lst.de>
 <20100617192217.GT27466@think>
 <20100618152928.GB10919@shareable.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel@vger.kernel.org
To: Jamie Lokier <jamie@shareable.org>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from rcsinet10.oracle.com ([148.87.113.121]:41908 "EHLO
	rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750818Ab0FRUcO (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 18 Jun 2010 16:32:14 -0400
Content-Disposition: inline
In-Reply-To: <20100618152928.GB10919@shareable.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Fri, Jun 18, 2010 at 04:29:28PM +0100, Jamie Lokier wrote:
> Chris Mason wrote:
> > On Thu, Jun 17, 2010 at 06:54:53PM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote:
> > > > Thanks, applied. There was a recent problem report on btrfs using
> > > > discard, could possibly explain it if Chris assumed it was a full
> > > > barrier.
> > > 
> > > We actually have a much bigger issue with the DISCARD_BARRIER type.
> > > If the discard request needs to get split into multiple smaller ones
> > > we don't keep the queue drained atomically around them, so requests
> > > could sneak inbetween them.  Depending on how the realtime discard
> > > is implemented that could cause issues.  In my XFS prototype for it
> > > I only deleted the extents from the tracking betree after the discard
> > > request has returned, but other filesystems rely on full barrier
> > > semantics of DISCARD_BARRIER this could cause real problems.
> > 
> > btrfs needs to know that a write after the discard returns won't cross
> > the discard, but beyond that we're happy with anything.
> 
> Is it acceptable for the write to move earlier than a discard that it
> doesn't overlap?  In other words, would a range-dependent barrier be
> sufficient (hypothetically, for some future elevator / multi-disk
> optimisation).
> 
> I guess answer to that depends on whether you're queuing a metadata
> write to record some fact about the discard which shouldn't reach the
> storage until the discard is confirmed done.

It's really just the sector we're discarding that matters.  So if I
discard sector xxyyzz and then write the same sector, please make sure
the discard is done before you put down my new contents ;)

-chris