From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anton Altaparmakov <aia21@cam.ac.uk>
Subject: Re: [PATCH] add support for vectored and async I/O to all simple
 filesystems
Date: Wed, 2 Nov 2005 21:04:44 +0000 (GMT)
Message-ID: <Pine.LNX.4.64.0511022048230.24959@hermes-1.csi.cam.ac.uk>
References: <20051101023656.GA23724@lst.de> <20051101192000.GB29542@mail.shareable.org>
 <20051101205745.GB27231@kvack.org> <20051102110630.GB30550@mail.shareable.org>
 <20051102162107.GA32755@kvack.org> <20051102162904.GK23749@parisc-linux.org>
 <20051102203105.GA20756@mail.shareable.org>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: Matthew Wilcox <matthew@wil.cx>, Benjamin LaHaise <bcrl@kvack.org>,
	Christoph Hellwig <hch@lst.de>, akpm@osdl.org,
	linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ppsw-0.csi.cam.ac.uk ([131.111.8.130]:26841 "EHLO
	ppsw-0.csi.cam.ac.uk") by vger.kernel.org with ESMTP
	id S965229AbVKBVE7 (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 2 Nov 2005 16:04:59 -0500
To: Jamie Lokier <jamie@shareable.org>
In-Reply-To: <20051102203105.GA20756@mail.shareable.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

On Wed, 2 Nov 2005, Jamie Lokier wrote:
> Matthew Wilcox wrote:
> > On Wed, Nov 02, 2005 at 11:21:07AM -0500, Benjamin LaHaise wrote:
> > > On Wed, Nov 02, 2005 at 11:06:30AM +0000, Jamie Lokier wrote:
> > > > So it means that any program that mustn't block, must now have a
> > > > stupid kernel version check to make sure it avoids even trying aio
> > > > system calls?  I was under the impression that the right thing to do
> > > > so far was try them, and when EINVAL is returned, use threads instead.
> > > 
> > > Yes, that is correct.
> > 
> > To be fair, the aio system calls were never _guaranteed_ to not block,
> > were they?  ISTR there were various corner cases that would still get
> > your task  blocking while doing an aio submission.
> 
> Could we have some documentation of when those corner cases occur?
> 
> The main point of aio, as far as I'm aware, is to avoid the need for
> threads (or reduce the number of threads) in programs using I/O that
> shouldn't block, particularly when they are latency sensitive too.
> 
> If aio has a habit of blocking from time to time, then it may still be
> useful, but it would be helpful to know that multiple threads are
> still needed to ensure a program (e.g. such as a HTTP or SMB server)
> can continue to make progress - and more helpful to know when.
> 
> One particular question is: can aio calls block for a long time due to
> network delays (e.g. over NFS) and I/O delays (e.g. slow disk or CD),
> or are the corner cases restricted to things like paging during memory
> allocation, which is unavoidable one way or another anyway?

Yes, of course aio can block and in fact will block arbitrarily for 
arbitrary lengths of time.  At least at present the implementations of 
->aio_read and ->aio_write in the file systems will block left right and 
center.

For a start, i_sem is downed which can block.

Then when we get inside readpage or the relevant file write function, 
buffers may be allocated for the current page which can block.

Then the filesystem needs to map the buffers if they are not mapped 
already and it is possible the filesystem needs to obtain other locks 
(again can block here) and even worse the filesystem may need to read data 
from disk to determine where mapping information for the buffers.  This 
obviously is a slow and blocking operation unless your device is a ram 
disk.

And in the write case the filesystem may need to allocate blocks on disk 
first, which in turn will involve taking locks (and possibly blocking) in 
addition to reading/writing metadata to find free blocks that can be 
allocated and marking them as allocated.  And that of course can involve 
on-disk access and hence again blocking.

I am not sure we need documentation for all that.  It is kind of obvious 
once you sit and think about what a read and a write actually implies.

The only way you can _really_ have guaranteed async io is to queue the io 
to a kernel thread work queue and return immediately to the caller.  The 
only thing you will then block on potentially is allocating memory for the 
"queue entry item" and on waiting for the lock to the "queue" so it is 
safe to write to it.

And if you do that, it then becomes easy to be truly non-blocking.  Just 
allocate with GFP_ATOMIC (and perhaps add __GFP_NORETRY and 
__GFP_NORECLAIM?) and do a try lock for the queue lock.  And if either of 
those fails, punt the reqest and return immediately to the user with error 
-EWOULDBLOCK or whatever...

You could even optimise away the queue lock by using an atomic compare and 
exchange based queue addition function but that may not be worth the extra 
complexity, don't know.  I guess the big smp folks may see contention on 
that lock...  You could at least do the queues and hence their locks per 
superblock or something...

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/