From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: [rfc] fsync_range? Date: Tue, 20 Jan 2009 22:42:39 +0000 Message-ID: <20090120224238.GA31540@shareable.org> References: <20090120183120.GD27464@shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, Nick Piggin To: Bryan Henderson Return-path: Received: from mail2.shareable.org ([80.68.89.115]:45517 "EHLO mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753814AbZATWmm (ORCPT ); Tue, 20 Jan 2009 17:42:42 -0500 Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Bryan Henderson wrote: > > For database writes, you typically write a bunch of stuff in various > > regions of a big file (or multiple files), then ideally fdatasync > > some/all of the written ranges - with writes committed to disk in the > > best order determined by the OS and I/O scheduler. > > > > For this, taking a vector of multiple ranges would be nice. > > Alternatively, issuing parallel fsync_range calls from multiple > > threads would approximate the same thing - if (big if) they aren't > > serialised by the kernel. > > That sounds like a job for fadvise(). A new FADV_WILLSYNC says you're > planning to sync that data soon. The kernel responds by scheduling the > I/O immediately. fsync_range() takes a single range and in this case is > just a wait. I think it would be easier for the user as well as more > flexible for the kernel than a multi-range fsync_range() or multiple > threads. FADV_WILLSYNC is already implemented: sync_file_range() with SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE. That will block in a few circumstances, but maybe that's inevitable. If you called FADV_WILLSYNC on a few ranges to mean "soon", how do you wait until those ranges are properly committed? How do you ensure the right low-level I/O barriers are sent for those ranges before you start writing post-barrier data? I think you're saying call FADV_WILLSYNC first on all the ranges, then call fsync_range() on each range in turn to wait for the I/O to be complete - although that will cause unnecessary I/O barriers, one per fsync_range(). You can do something like that with sync_file_range() at the moment, except no way to ask for the barrier. -- Jamie