From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265262AbUETWUq (ORCPT ); Thu, 20 May 2004 18:20:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265310AbUETWUq (ORCPT ); Thu, 20 May 2004 18:20:46 -0400 Received: from pao-nav01.pao.digeo.com ([12.47.58.24]:24839 "HELO pao-nav01.pao.digeo.com") by vger.kernel.org with SMTP id S265262AbUETWUm (ORCPT ); Thu, 20 May 2004 18:20:42 -0400 Date: Thu, 20 May 2004 15:23:05 -0700 From: Andrew Morton To: alexeyk@mysql.com, linuxram@us.ibm.com, nickpiggin@yahoo.com.au, peter@mysql.com, linux-kernel@vger.kernel.org, axboe@suse.de Subject: Re: Random file I/O regressions in 2.6 [patch+results] Message-Id: <20040520152305.3dbfa00b.akpm@osdl.org> In-Reply-To: <20040520145902.27647dee.akpm@osdl.org> References: <200405022357.59415.alexeyk@mysql.com> <1084480888.22208.26.camel@dyn319386.beaverton.ibm.com> <1084815010.13559.3.camel@localhost.localdomain> <200405200506.03006.alexeyk@mysql.com> <20040520145902.27647dee.akpm@osdl.org> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i586-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 20 May 2004 22:20:25.0558 (UTC) FILETIME=[A6253F60:01C43EB8] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: > > There's still something wrong here. 2.6.6-bk+deadline is pretty equivalent > to 2.4 from an IO scheduler point of view in this test. Yet it's a couple > of percent slower. > > I don't know why you're still seeing significant discrepancies. > > What sort of disk+controller system are you using? If scsi, what is the > tag queue depth set to? Is writeback caching enabled on the disk? If the 2.4 and 2.6 disk accounting statitics are to be believed, they show something interesting. Workload is one run of sysbench --num-threads=16 --test=fileio --file-total-size=2G --file-test-mode=rndrw run on ext2. 2.4.27-pre2: rio: 5549 (Read requests issued) rblk: 259680 (Total sectors read) wio: 42398 (Write requests issued) wblk: 4368056 (Total sectors written) 2.6.6-bk, as: reads: 5983 readsectors: 201192 writes: 22548 writesectors: 4343184 - Note that 2.6 read 20% less data from the disk. We observed this before. It appears that 2.6 page replacements decisions are working better for this workload. - Despite that, 2.6 issued *more* read requests. So it is submitting more, and smaller I/O's - Both kernels wrote basically the same amount of data. 2.6 a little less, perhaps because of fsync() optimisations. - But 2.6 issued far fewer write requests. Half as many as 2.4 - a huge difference. There are a number of reasons why this could happen but frankly, I don't have a clue what's going on in there. Given that 2.6 is issuing less IO requests it should be performing faster than 2.4. The reason that the two kernels are achieving about the same throughput despite this is that the disk is performing writeback caching and is absorbing 2.4's smaller write requests. I set the IDE disk to do writethrough (hdparm -W0): 2.6.6-bk, as: Time spent for test: 89.9427s 0.04s user 5.24s system 1% cpu 4:51.62 total 2.4.27-pre2: Time spent for test: 107.8293s 0.04s user 6.00s system 1% cpu 7:26.47 total as expected. Open questions are: a) Why is 2.6 write coalescing so superior to 2.4? b) Why is 2.6 issuing more read requests, for less data? c) Why is Alexey seeing dissimilar results?