From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Sat, 11 May 2002 22:37:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Sat, 11 May 2002 22:37:11 -0400 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:26124 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id ; Sat, 11 May 2002 22:37:09 -0400 Message-ID: <3CDDD614.FA8A4AB9@zip.com.au> Date: Sat, 11 May 2002 19:40:20 -0700 From: Andrew Morton X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.19-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: yodaiken@fsmlabs.com CC: Lincoln Dale , Linus Torvalds , Larry McVoy , Gerrit Huizenga , Alan Cox , Martin Dalecki , Padraig Brady , Anton Altaparmakov , Kernel Mailing List Subject: Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) In-Reply-To: <20020511111935.B30126@work.bitmover.com> <5.1.0.14.2.20020512092751.02bcca40@mira-sjcm-3.cisco.com> <20020511183616.A10381@hq.fsmlabs.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org yodaiken@fsmlabs.com wrote: > > We did some i/o profiling about 6 years ago on a big scientific > app that had started in fortran and had been rewritten in c++ > the fortran code used r/w on files and used temp files > the c++ did memmaps and had big data structures - taking advantage of > memory management. > one thing I thought was interesting is that it was easy to see how a smart > algorithm, not even such a smart one, could adapt i/o to the patterns of > i/o in the fortran code, but the c++ i/o patters were really complex. > > when everything goes into the page cache, it seems like you will loose > information. > That is certainly the case. If the application is seekily writing to a file then we currently lay the file out on-disk in the order in which the application seeked. So reading the file back linearly is very slow. Now this is not necessarily a bad thing - if the file was created seekily then it will probably be _used_ seekily so no big loss probably. This problem is pretty unsolvable for filesystems which map blocks to their disk address at write(2) time. It can be solved for allocate-on-flush filesystems via a sort of the dirty page list, or by maintaining ->dirty_pages in a tree or whatever. There is one "file" where this problem really does matter - the blockdev mapping "/dev/hda1". It is both highly fragmented and poorly sorted on the dirty_pages list. It's pretty trivial to perform a sillysort at writeout time: if we just wrote page N and the next page isn't N+1 then do a pagecache probe for "N+1". That's probably sufficient. If not, there's a simple little sort routine over at http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html which is appropriate to our lists. I'll be taking a look at the sillysort option once I've cleared away some other I/O scheduling glitches. -