From mboxrd@z Thu Jan 1 00:00:00 1970 From: Werner Almesberger Subject: Re: barriers vs. reads - O_DIRECT Date: Fri, 25 Jun 2004 00:21:43 -0300 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20040625002143.A28137@almesberger.net> References: <20040623214845.A21586@almesberger.net> <20040624144638.V1325@almesberger.net> <20040624185059.GA11175@mail.shareable.org> <20040624175516.W1325@almesberger.net> <20040624224259.GA12840@mail.shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Bryan Henderson , linux-fsdevel@vger.kernel.org Return-path: Received: from almesberger.net ([63.105.73.238]:56582 "EHLO host.almesberger.net") by vger.kernel.org with ESMTP id S266194AbUFYDWH (ORCPT ); Thu, 24 Jun 2004 23:22:07 -0400 To: Jamie Lokier Content-Disposition: inline In-Reply-To: <20040624224259.GA12840@mail.shareable.org>; from jamie@shareable.org on Thu, Jun 24, 2004 at 11:42:59PM +0100 List-Id: linux-fsdevel.vger.kernel.org Jamie Lokier wrote: > For that purpose, a barrier has a set of writes which must come before > it, and a set of writes which must come after. These represent a > transaction set. Okay, partial order then. AFAIK, this doesn't allow us to express things like "do A after one of { B, C } is done". That could be useful for redundant storage structures, but we may not care. > A then (barrier) B > C then (barrier) D > > It's ok to schedule those as: > > A, B then (barrier) C, D I think you mean "A, C then (barrier) B, D" :-) There's a problem, though: the first has the following relations: A < B, C < D. The second has: A < B, A < D, C < B, C < D. So now the elevator needs to decided if avoiding the cost of the side-effects of a barrier (cache-flush, or such) is acceptable, given the cost of the additional ordering restrictions. (I'm carefully trying to avoid to say "has less cost", because the elevator may not always pick the "cheapest" variant.) > It would be nice to come up with a interface that the loopback device > can support and relay through the underlying fs. You really like those loopback devices, don't you ? :-) > ext3 and reiserfs both offered this from the begining, so it's > important to someone. Sigh, yes ... > committed prerequisite writes to stable storage. At other times, a > barrier is there only to preserve ordering so that a journal > functions, but it's not required that the data is actually committed > to storage immediately -- merely that it _will_ be committed in order. Well, to implement cross-device barriers, you need a means to find out if the prerequisites for the local device to cross a barrier are satisfied, and to make the local device wait until then. This may be implemented in a completely different thread/whatever than the actual IO. Making an elevator non-work-conserving is an interesting exercise by itself. (It may be feasible: I've done it for a power management experiment (*), and this works at least for PATA.) (*) To stop, I simply make next_req return NULL. queue_empty still returns the correct result. To resume, I have an external trigger that calls blk_start_queue. There's of course the question whether the elevator is really the right place for all this. Perhaps just embedding a callback in a barrier request could be used for a more general solution - in particular one that would allow us to avoid having to solve all problems at once :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/