From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S269133AbTHGRcX (ORCPT ); Thu, 7 Aug 2003 13:32:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S267317AbTHGRcX (ORCPT ); Thu, 7 Aug 2003 13:32:23 -0400 Received: from e2.ny.us.ibm.com ([32.97.182.102]:39150 "EHLO e2.ny.us.ibm.com") by vger.kernel.org with ESMTP id S270373AbTHGRcT convert rfc822-to-8bit (ORCPT ); Thu, 7 Aug 2003 13:32:19 -0400 Content-Type: text/plain; charset=US-ASCII From: Badari Pulavarty To: Andrew Morton Subject: Re: [PATCH][2.6-mm] Readahead issues and AIO read speedup Date: Thu, 7 Aug 2003 10:21:39 -0700 User-Agent: KMail/1.4.1 Cc: suparna@in.ibm.com, linux-kernel@vger.kernel.org, linux-aio@kvack.org References: <20030807100120.GA5170@in.ibm.com> <200308070901.01119.pbadari@us.ibm.com> <20030807092800.58335e84.akpm@osdl.org> In-Reply-To: <20030807092800.58335e84.akpm@osdl.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Message-Id: <200308071021.39816.pbadari@us.ibm.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thursday 07 August 2003 09:28 am, Andrew Morton wrote: > Badari Pulavarty wrote: > > I noticed the exact same thing while testing on database benchmark > > on filesystems (without AIO). I added instrumentation in scsi layer to > > record the IO pattern and I found that we are doing lots of (4million) > > 4K reads, in my benchmark run. I was tracing that and found that all > > those reads are generated by slow read path, since readahead window > > is maximally shrunk. When I forced the readahead code to read 16k > > (my database pagesize), in case ra window closed - I see 20% improvement > > in my benchmark. I asked "Ramchandra Pai" (linuxram@us.ibm.com) > > to investigate it further. > > But if all the file's pages are already in pagecache (a common case) > this patched kernel will consume extra CPU pointlessly poking away at > pagecache. Reliably shrinking the window to zero is important for this > reason. Yes !! I hardcoded it to 16k, since I know that all my reads will be 16k (atleast). We should do readahead of actual pages required by the current read would be correct solution. (like Suparna suggested). > > If the database pagesize is 16k then the application should be submitting > 16k reads, yes? Yes. Database always does IO in atleast 16k (in my case). > If so then these should not be creating 4k requests at the > device layer! So what we need to do is to ensure that at least those 16k > worth of pages are submitted in a single chunk. Without blowing CPU if > everything is cached. Tricky. I'll take a look at what's going on. When readahead window is closed, slow read code will be submitting IO in 4k chunks. Infact, it will wait for the IO to finish, before reading next page. Isn't it ? How would you ensure atleast 16k worth of pages are submitted in a sinle chunk here ? I am hoping that forcing readhead code to read pages needed by current read would address this problem. > Another relevant constraint here (and there are lots of subtle constraints > in readahead) is that often database files are fragmented all over the > disk, because they were laid out that way (depends on the database and > how it was set up). In this case, any extra readahead is a disaster > because it incurs extra seeks, needlessly. Agreed. In my case, I made sure that all the files are almost contiguous. (I put one file per filesystem - and verified thro debugfs). Thanks, Badari