From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [rfc] fsync_range?
Date: Wed, 21 Jan 2009 20:53:56 +0000
Message-ID: <20090121205356.GB16133@shareable.org>
References: <20090121013606.GE24891@wotan.suse.de> <OF283451BF.72201E61-ON88257545.006C9A4D-88257545.006DB6EA@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Nick Piggin <npiggin@suse.de>, linux-fsdevel@vger.kernel.org
To: Bryan Henderson <hbryan@us.ibm.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail2.shareable.org ([80.68.89.115]:46569 "EHLO
	mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751975AbZAUUyE (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 21 Jan 2009 15:54:04 -0500
Content-Disposition: inline
In-Reply-To: <OF283451BF.72201E61-ON88257545.006C9A4D-88257545.006DB6EA@us.ibm.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Bryan Henderson wrote:
> Nick Piggin <npiggin@suse.de> wrote on 01/20/2009 05:36:06 PM:
> 
> > On Tue, Jan 20, 2009 at 01:25:59PM -0800, Bryan Henderson wrote:
> > > > For this, taking a vector of multiple ranges would be nice.
> > > > Alternatively, issuing parallel fsync_range calls from multiple
> > > > threads would approximate the same thing - if (big if) they aren't
> > > > serialised by the kernel.
> > > 
> > > That sounds like a job for fadvise().  A new FADV_WILLSYNC says you're 
> 
> > > planning to sync that data soon.  The kernel responds by scheduling 
> the 
> > > I/O immediately.  fsync_range() takes a single range and in this case 
> is 
> > > just a wait.  I think it would be easier for the user as well as more 
> > > flexible for the kernel than a multi-range fsync_range() or multiple 
> > > threads.
> > 
> > A problem is that the kernel will not always be able to schedule the
> > IO without blocking (various mutexes or block device queues full etc).
> 
> I don't really see the problem with that.  We're talking about a program 
> that is doing device-synchronous I/O.  Blocking is a way of life.  Plus, 
> the beauty of advice is that if it's hard occasionally, the kernel can 
> just ignore it.

If you have 100 file regions, each one a few pages in size, and you do
100 fsync_range() calls, that results in potentally far from optimal
I/O scheduling (e.g. all over the disk) *and* 100 low-level disk cache
flushes (I/O barriers) instead of just one at the end.  100 head seeks
and 100 cache flush ops can be very expensive.

This is the point of taking a vector of ranges to flush - or some
other way to "plug" the I/O and only wait for it after submitting it
all.

-- Jamie