From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+willy=40w.ods.org@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S270227AbTHGQM4 (ORCPT <rfc822;willy@w.ods.org>);
	Thu, 7 Aug 2003 12:12:56 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S270237AbTHGQMz
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Thu, 7 Aug 2003 12:12:55 -0400
Received: from e5.ny.us.ibm.com ([32.97.182.105]:28317 "EHLO e5.ny.us.ibm.com")
	by vger.kernel.org with ESMTP id S270227AbTHGQLj convert rfc822-to-8bit
	(ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 7 Aug 2003 12:11:39 -0400
Content-Type: text/plain; charset=US-ASCII
From: Badari Pulavarty <pbadari@us.ibm.com>
To: suparna@in.ibm.com, akpm@osdl.org
Subject: Re: [PATCH][2.6-mm] Readahead issues and AIO read speedup
Date: Thu, 7 Aug 2003 09:01:01 -0700
User-Agent: KMail/1.4.1
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org
References: <20030807100120.GA5170@in.ibm.com>
In-Reply-To: <20030807100120.GA5170@in.ibm.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7BIT
Message-Id: <200308070901.01119.pbadari@us.ibm.com>
Sender: linux-kernel-owner@vger.kernel.org
X-Mailing-List: linux-kernel@vger.kernel.org

Suparna,

I noticed  the exact same thing while testing on database benchmark 
on filesystems (without AIO). I added instrumentation in scsi layer to 
record the IO pattern and I found that we are doing lots of (4million) 
4K reads, in my benchmark run. I was tracing that and found that all
those reads are generated by slow read path, since readahead window 
is maximally shrunk. When I forced the readahead code to read 16k 
(my database pagesize), in case ra window closed - I see 20% improvement 
in my benchmark. I asked "Ramchandra Pai" (linuxram@us.ibm.com)
to investigate it further.

Thanks,
Badari

On Thursday 07 August 2003 03:01 am, Suparna Bhattacharya wrote:
> I noticed a problem with the way do_generic_mapping_read
> and readahead works for the case of large reads, especially
> random reads. This was leading to very inefficient behaviour
> for a stream for AIO reads. (See the results a little later
> in this note)
>
> 1) We should be reading ahead at least the pages that are
>    required by the current read request (even if the ra window
>    is maximally shrunk). I think I've seen this in 2.4 - we
>    seem to have lost that in 2.5.
>    The result is that sometimes (large random reads) we end
>    up doing reads one page at a time waiting for it to complete
>    being reading the next page and so on, even for a large read.
>    (until we buildup a readahead window again)
>
> 2) Once the ra window is maximally shrunk, the responsibility
>    for reading the pages and re-building the window is shifted
>    to the slow path in read, which breaks down in the case of
>    a stream of AIO reads where multiple iocbs submit reads
>    to the same file rather than serialise the wait for i/o
>    completion.
>
> So here is a patch that fixes this by making sure we do
> (1) and pushing up the handle_ra_miss calls for the maximally
> shrunk case before the loop that waits for I/O completion.
>
> Does it make a difference ? A lot, actually.