From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+willy=40w.ods.org@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S269133AbTHGRcX (ORCPT <rfc822;willy@w.ods.org>);
	Thu, 7 Aug 2003 13:32:23 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S267317AbTHGRcX
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Thu, 7 Aug 2003 13:32:23 -0400
Received: from e2.ny.us.ibm.com ([32.97.182.102]:39150 "EHLO e2.ny.us.ibm.com")
	by vger.kernel.org with ESMTP id S270373AbTHGRcT convert rfc822-to-8bit
	(ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 7 Aug 2003 13:32:19 -0400
Content-Type: text/plain; charset=US-ASCII
From: Badari Pulavarty <pbadari@us.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Subject: Re: [PATCH][2.6-mm] Readahead issues and AIO read speedup
Date: Thu, 7 Aug 2003 10:21:39 -0700
User-Agent: KMail/1.4.1
Cc: suparna@in.ibm.com, linux-kernel@vger.kernel.org, linux-aio@kvack.org
References: <20030807100120.GA5170@in.ibm.com> <200308070901.01119.pbadari@us.ibm.com> <20030807092800.58335e84.akpm@osdl.org>
In-Reply-To: <20030807092800.58335e84.akpm@osdl.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 7BIT
Message-Id: <200308071021.39816.pbadari@us.ibm.com>
Sender: linux-kernel-owner@vger.kernel.org
X-Mailing-List: linux-kernel@vger.kernel.org

On Thursday 07 August 2003 09:28 am, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >  I noticed  the exact same thing while testing on database benchmark
> >  on filesystems (without AIO). I added instrumentation in scsi layer to
> >  record the IO pattern and I found that we are doing lots of (4million)
> >  4K reads, in my benchmark run. I was tracing that and found that all
> >  those reads are generated by slow read path, since readahead window
> >  is maximally shrunk. When I forced the readahead code to read 16k
> >  (my database pagesize), in case ra window closed - I see 20% improvement
> >  in my benchmark. I asked "Ramchandra Pai" (linuxram@us.ibm.com)
> >  to investigate it further.
>
> But if all the file's pages are already in pagecache (a common case)
> this patched kernel will consume extra CPU pointlessly poking away at
> pagecache.  Reliably shrinking the window to zero is important for this
> reason.

Yes !! I hardcoded it to 16k, since I know that all my reads will be 16k 
(atleast). We should do readahead of actual pages required by the current 
read would be correct solution. (like Suparna suggested).

>
> If the database pagesize is 16k then the application should be submitting
> 16k reads, yes? 

Yes. Database always does IO in atleast 16k (in my case).

> If so then these should not be creating 4k requests at the
> device layer!  So what we need to do is to ensure that at least those 16k
> worth of pages are submitted in a single chunk.  Without blowing CPU if
> everything is cached.  Tricky.  I'll take a look at what's going on.

When readahead window is closed,  slow read code will be submitting IO in 4k 
chunks. Infact, it will wait for the IO to finish, before reading next page. 
Isn't it ? How would you ensure atleast 16k worth of pages are submitted
in a sinle chunk here ?

I am hoping that forcing readhead code to read pages needed by current read 
would address this problem.

> Another relevant constraint here (and there are lots of subtle constraints
> in readahead) is that often database files are fragmented all over the
> disk, because they were laid out that way (depends on the database and
> how it was set up).  In this case, any extra readahead is a disaster
> because it incurs extra seeks, needlessly.

Agreed. In my case, I made sure that all the files are almost contiguous.
(I put one file per filesystem - and verified thro debugfs). 

Thanks,
Badari