From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: barriers vs. reads Date: Tue, 22 Jun 2004 13:32:45 +0200 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20040622113245.GA1104@suse.de> References: <20040622005302.A1325@almesberger.net> <20040622073919.GV12881@suse.de> <20040622045004.C1325@almesberger.net> <20040622075531.GX12881@suse.de> <20040622112802.GA21456@mail.shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Werner Almesberger , linux-fsdevel@vger.kernel.org Return-path: Received: from ns.virtualhost.dk ([195.184.98.160]:27802 "EHLO virtualhost.dk") by vger.kernel.org with ESMTP id S262547AbUFVLcy (ORCPT ); Tue, 22 Jun 2004 07:32:54 -0400 To: Jamie Lokier Content-Disposition: inline In-Reply-To: <20040622112802.GA21456@mail.shareable.org> List-Id: linux-fsdevel.vger.kernel.org On Tue, Jun 22 2004, Jamie Lokier wrote: > Jens Axboe wrote: > > > But do we have cases where reads must not cross write barriers ? > > > > To me, it's the expected behaviour. If you issue a barrier write, a read > > issued later should not be able to fetch old data. > > Two things: > > 1. A read _which doesn't overlap writes before the barrier_ > should be ok before the barrier with no visible change. > > So, look at the block numbers and permit reordering if there's > no overlap. This reordering is semantically invisible. You mean a read that doesn't contain sectors that overlap with the barrier writes? Yes that would be fine. It's easier said than done, though. Current io schedulers don't handle barriers in a very fast fashion - they push all pending requests from the internal sorted tree to the dispatch list, the latter which is always accessed in FIFO like fashion (io scheduler adds to tail, driver eats from the head). So if you wanted to optimize this, that has to be changed. > 2. Other than O_DIRECT, can the I/O subsystem issue reads that > overlap writes in flight? Surely that never occurs? No, it can only happen for reads that don't go through the page cache. > If it never occurs, then reads can be safely moved before write > barriers without looking at block numbers. It can happen with direct io of any sort, the solution has to take this into account. That's why we currently have handling for rbtree aliases as well. -- Jens Axboe