From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264134AbUEDAvk (ORCPT ); Mon, 3 May 2004 20:51:40 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264169AbUEDAvj (ORCPT ); Mon, 3 May 2004 20:51:39 -0400 Received: from e35.co.us.ibm.com ([32.97.110.133]:30449 "EHLO e35.co.us.ibm.com") by vger.kernel.org with ESMTP id S264134AbUEDAvh (ORCPT ); Mon, 3 May 2004 20:51:37 -0400 Subject: Re: Random file I/O regressions in 2.6 From: Ram Pai To: Nick Piggin Cc: Andrew Morton , peter@mysql.com, alexeyk@mysql.com, linux-kernel@vger.kernel.org, axboe@suse.de In-Reply-To: <4096E1A6.2010506@yahoo.com.au> References: <200405022357.59415.alexeyk@mysql.com> <409629A5.8070201@yahoo.com.au> <20040503110854.5abcdc7e.akpm@osdl.org> <1083615727.7949.40.camel@localhost.localdomain> <20040503135719.423ded06.akpm@osdl.org> <1083620245.23042.107.camel@abyss.local> <20040503145922.5a7dee73.akpm@osdl.org> <4096DC89.5020300@yahoo.com.au> <20040503171005.1e63a745.akpm@osdl.org> <4096E1A6.2010506@yahoo.com.au> Content-Type: text/plain Organization: Message-Id: <1083631804.4544.16.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 03 May 2004 17:50:05 -0700 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2004-05-03 at 17:19, Nick Piggin wrote: > Andrew Morton wrote: > > Nick Piggin wrote: > > > >>>That's one of its usage patterns. It's also supposed to detect the > >>>fixed-sized-reads-seeking-all-over-the-place situation. In which case it's > >>>supposed to submit correctly-sized multi-page BIOs. But it's not working > >>>right for this workload. > >>> > >>>A naive solution would be to add special-case code which always does the > >>>fixed-size readahead after a seek. Basically that's > >>> > >>> if (ra->next_size == -1UL) > >>> force_page_cache_readahead(...) > >>> > >> > >>I think a better solution to this case would be to ensure the > >>readahead window is always min(size of read, some large number); > >> > > > > > > That would cause the kernel to perform lots of pointless pagecache lookups > > when the file is already 100% cached. > > > > > That's pretty sad. You need a "preread" or something which > sends the pages back... or uses the actor itself. readahead > would then have to be reworked to only run off the end of > the read window, but that is what it should be doing anyway. Sorry, If I am saying this again. I have checked the behaviour of the readahead code using my user level simulator as well as running some DSS benchmark and iozone benchmark. It generates a steady stream of large i/o for large-random-reads and should not exhibit the bad behavior that we are seeing. I feel this bad behavior is because of interleaved access by multiple thread. To illustrate with an example: t1 request reads from page 100 to 104 simultaneously t2 requests reads on the same fd from 200 to 204 So do_page_cache_readahead() can be called in the following pattern. 100,200,101,201,102,202,103,203,104,204. Because of this pattern the readahaed code assumes that the read pattern is absolutely random and hence closes the readahead window. I think I should generate a patch to validate this behavior, I will. How about having some /proc counters that keep track of number of window-closes because of cache-hits and because of cache-misses? RP