From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: barriers vs. reads - O_DIRECT Date: Thu, 24 Jun 2004 19:50:59 +0100 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20040624185059.GA11175@mail.shareable.org> References: <20040623214845.A21586@almesberger.net> <20040624144638.V1325@almesberger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Bryan Henderson , linux-fsdevel@vger.kernel.org Return-path: Received: from mail.shareable.org ([81.29.64.88]:27818 "EHLO mail.shareable.org") by vger.kernel.org with ESMTP id S264771AbUFXSvo (ORCPT ); Thu, 24 Jun 2004 14:51:44 -0400 To: Werner Almesberger Content-Disposition: inline In-Reply-To: <20040624144638.V1325@almesberger.net> List-Id: linux-fsdevel.vger.kernel.org Werner Almesberger wrote: > Bryan Henderson wrote: > > It seems obvious to me that whatever ordering guarantees the user gets > > without the O_DIRECT flag, he should get with it as well. > > Yes, it would be nice if we could obtain such behaviour without > unacceptable performance sacrifices. It seems to me that, if we > can find an efficient way for serializing all write-write and > read-write overlaps, plus have explicit barriers for serializing > non-overlapping writes, this should yield pretty much what > everyone wants (*). Now, that "if" needs a bit of work ... :-) Note that what filesystems and databases want is write-write *partial dependencies*. The per-device I/O barrier is just a crude approximation. 1. Think about this: two filesystems on different partitions of the same device. The writes of each filesystem are independent, yet the barriers will force the writes of one filesystem to come before later-queued writes of the other. 2. Or, two database back-ends doing direct I/O to two separate files. It's probably not a big performance penalty, but it illustrates that the barriers are "bigger" than they need to be. Worth taking into account when deciding what minimal ordering everyone _really_ wants. If you do implement overlap detection logic, then would giving barriers an I/O range be helpful? E.g. corresponding to partitions. Here's a few more cases, which may not be quite right even now: 3. What if a journal is on a different device to its filesystem? Ideally, write barriers between the different device queues would be appropriate. 4. A journalling filesystem mounted on a loopback device. Is this reliable now? 5. A journalling filesystem mounted on two loopback devices -- one for the fs, one for the journal. > (*) The only difference being that a completing read doesn't > tell you whether the elevator has already passed a barrier. > Currently, one could be lured into depending on this. Isn't the barrier itself an I/O operation which can be waited on? I agree something could depend on the reads at the moment. -- Jamie