From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: [rfc] fsync_range? Date: Tue, 20 Jan 2009 18:31:21 +0000 Message-ID: <20090120183120.GD27464@shareable.org> References: <20090120164726.GA24891@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org To: Nick Piggin Return-path: Received: from mail2.shareable.org ([80.68.89.115]:46227 "EHLO mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752575AbZATSbW (ORCPT ); Tue, 20 Jan 2009 13:31:22 -0500 Content-Disposition: inline In-Reply-To: <20090120164726.GA24891@wotan.suse.de> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Nick Piggin wrote: > Just wondering if we should add an fsync_range syscall like AIX and > some BSDs have? It's pretty simple for the pagecache since it > already implements the full sync with range syncs anyway. For > filesystems and user programs, I imagine it is a bit easier to > convert to fsync_range from fsync rather than use the sync_file_range > syscall. > > Having a flags argument is nice, but AIX seems to use O_SYNC as a > flag, I wonder if we should follow? I like the idea. It's much easier to understand than sync_file_range, whose man page doesn't really explain how to use it correctly. But how is fsync_range different from the sync_file_range syscall with all its flags set? For database writes, you typically write a bunch of stuff in various regions of a big file (or multiple files), then ideally fdatasync some/all of the written ranges - with writes committed to disk in the best order determined by the OS and I/O scheduler. For this, taking a vector of multiple ranges would be nice. Alternatively, issuing parallel fsync_range calls from multiple threads would approximate the same thing - if (big if) they aren't serialised by the kernel. -- Jamie