From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [PATCH] block: fix DISCARD_BARRIER requests Date: Fri, 18 Jun 2010 16:30:29 -0400 Message-ID: <20100618203029.GA27466@think> References: <20100617075432.GA22407@lst.de> <4C19D86A.5030709@kernel.dk> <20100617165453.GA15824@lst.de> <20100617192217.GT27466@think> <20100618152928.GB10919@shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Christoph Hellwig , Jens Axboe , linux-fsdevel@vger.kernel.org To: Jamie Lokier Return-path: Received: from rcsinet10.oracle.com ([148.87.113.121]:41908 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750818Ab0FRUcO (ORCPT ); Fri, 18 Jun 2010 16:32:14 -0400 Content-Disposition: inline In-Reply-To: <20100618152928.GB10919@shareable.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, Jun 18, 2010 at 04:29:28PM +0100, Jamie Lokier wrote: > Chris Mason wrote: > > On Thu, Jun 17, 2010 at 06:54:53PM +0200, Christoph Hellwig wrote: > > > On Thu, Jun 17, 2010 at 10:10:18AM +0200, Jens Axboe wrote: > > > > Thanks, applied. There was a recent problem report on btrfs using > > > > discard, could possibly explain it if Chris assumed it was a full > > > > barrier. > > > > > > We actually have a much bigger issue with the DISCARD_BARRIER type. > > > If the discard request needs to get split into multiple smaller ones > > > we don't keep the queue drained atomically around them, so requests > > > could sneak inbetween them. Depending on how the realtime discard > > > is implemented that could cause issues. In my XFS prototype for it > > > I only deleted the extents from the tracking betree after the discard > > > request has returned, but other filesystems rely on full barrier > > > semantics of DISCARD_BARRIER this could cause real problems. > > > > btrfs needs to know that a write after the discard returns won't cross > > the discard, but beyond that we're happy with anything. > > Is it acceptable for the write to move earlier than a discard that it > doesn't overlap? In other words, would a range-dependent barrier be > sufficient (hypothetically, for some future elevator / multi-disk > optimisation). > > I guess answer to that depends on whether you're queuing a metadata > write to record some fact about the discard which shouldn't reach the > storage until the discard is confirmed done. It's really just the sector we're discarding that matters. So if I discard sector xxyyzz and then write the same sector, please make sure the discard is done before you put down my new contents ;) -chris