From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: [rfc] fsync_range? Date: Wed, 21 Jan 2009 20:53:56 +0000 Message-ID: <20090121205356.GB16133@shareable.org> References: <20090121013606.GE24891@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Nick Piggin , linux-fsdevel@vger.kernel.org To: Bryan Henderson Return-path: Received: from mail2.shareable.org ([80.68.89.115]:46569 "EHLO mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751975AbZAUUyE (ORCPT ); Wed, 21 Jan 2009 15:54:04 -0500 Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Bryan Henderson wrote: > Nick Piggin wrote on 01/20/2009 05:36:06 PM: > > > On Tue, Jan 20, 2009 at 01:25:59PM -0800, Bryan Henderson wrote: > > > > For this, taking a vector of multiple ranges would be nice. > > > > Alternatively, issuing parallel fsync_range calls from multiple > > > > threads would approximate the same thing - if (big if) they aren't > > > > serialised by the kernel. > > > > > > That sounds like a job for fadvise(). A new FADV_WILLSYNC says you're > > > > planning to sync that data soon. The kernel responds by scheduling > the > > > I/O immediately. fsync_range() takes a single range and in this case > is > > > just a wait. I think it would be easier for the user as well as more > > > flexible for the kernel than a multi-range fsync_range() or multiple > > > threads. > > > > A problem is that the kernel will not always be able to schedule the > > IO without blocking (various mutexes or block device queues full etc). > > I don't really see the problem with that. We're talking about a program > that is doing device-synchronous I/O. Blocking is a way of life. Plus, > the beauty of advice is that if it's hard occasionally, the kernel can > just ignore it. If you have 100 file regions, each one a few pages in size, and you do 100 fsync_range() calls, that results in potentally far from optimal I/O scheduling (e.g. all over the disk) *and* 100 low-level disk cache flushes (I/O barriers) instead of just one at the end. 100 head seeks and 100 cache flush ops can be very expensive. This is the point of taking a vector of ranges to flush - or some other way to "plug" the I/O and only wait for it after submitting it all. -- Jamie