From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754353AbYFAUiV (ORCPT ); Sun, 1 Jun 2008 16:38:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751906AbYFAUiM (ORCPT ); Sun, 1 Jun 2008 16:38:12 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:40271 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751924AbYFAUiL (ORCPT ); Sun, 1 Jun 2008 16:38:11 -0400 Date: Sun, 1 Jun 2008 13:37:27 -0700 From: Andrew Morton To: Pavel Machek Cc: mtk.manpages@gmail.com, Hugh Dickins , kernel list , "Rafael J. Wysocki" Subject: Re: sync_file_range(SYNC_FILE_RANGE_WRITE) blocks? Message-Id: <20080601133727.4e62ae55.akpm@linux-foundation.org> In-Reply-To: <20080601114008.GC16843@elf.ucw.cz> References: <20080530102619.GA2468@elf.ucw.cz> <20080530204307.GA4978@ucw.cz> <20080531173950.c4f04028.akpm@linux-foundation.org> <20080601011501.199af80c.akpm@linux-foundation.org> <20080601114008.GC16843@elf.ucw.cz> X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.5; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 1 Jun 2008 13:40:09 +0200 Pavel Machek wrote: > Hi! > > > > > > All I can say so far is that I find the same as you do: > > > > > SYNC_FILE_RANGE_WRITE (after writing) takes a significant amount of time, > > > > > more than half as long as when you add in SYNC_FILE_RANGE_WAIT_AFTER too. > > > > > > > > > > Which make the sync_file_range call pretty pointless: your usage seems > > > > > perfectly reasonable to me, but somehow we've broken its behaviour. > > > > > I'll be investigating ... > > > > > > > > It will block on disk queue fullness - sysrq-W will tell. > > > > > > Ah, thank you. What a disappointment, though it's understandable. > > > Doesn't that very severely limit the usefulness of the system call? > > > > A bit. The request queue size is runtime tunable though. > > Which /sys is that? /sys/block/sda/queue/nr_requests > What happens if I set the queue size to pretty > much infinity, will memory management die horribly? In theory, no - it's always caused problems when the VM/VFS/FS layer has relied upon request-queue exhaustion for throttling. Hence all that code is supposed to work OK when there is no request-queue blocking. Of course, (theory/practice != 1.0). > > I expect major users of this system call will be applications which do > > small-sized overwrites into large files, mainly databases. That is, > > once the application developers discover its existence. I'm still > > getting expressions of wonder from people who I tell about the > > five-year-old fadvise(). > > Hey, you have one user now, its called s2disk. But for this call to be > useful, we'd need asynchronous variant... is there such thing? Well if you're asking the syscall to shove more data into the block layer than it can concurrently handle, sure, the block layer will block. It's tunable... It can still block in places, of course - we might need to do synchronous reads to get at metadata and we'll need to allocate memory. > Okay, I can fork and do the call from another process, but... I sense a strangeness. What are you actually trying to do with all of this? Bear in mind that sync_file_range() doesn't sync metadata (ie: indirect blocks). So if they weren't already known to have been written, the data isn't safe. > - * range which are not presently under writeback. > + * range which are not presently under writeback. Notice that even this this > + * may and will block if you attempt to write more than request queue size. um, OK. I'll fix the grammar a bit there.