From mboxrd@z Thu Jan 1 00:00:00 1970 From: Badari Pulavarty Subject: Re: ext3 writepages ? Date: 10 Feb 2005 10:32:05 -0800 Message-ID: <1108060325.20053.1145.camel@dyn318077bld.beaverton.ibm.com> References: Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Andreas Dilger , linux-fsdevel@vger.kernel.org, Sonny Rao Received: from e32.co.us.ibm.com ([32.97.110.130]:60855 "EHLO e32.co.us.ibm.com") by vger.kernel.org with ESMTP id S262191AbVBJSbj (ORCPT ); Thu, 10 Feb 2005 13:31:39 -0500 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j1AIUSuA573788 for ; Thu, 10 Feb 2005 13:30:31 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j1AIUR3c428874 for ; Thu, 10 Feb 2005 11:30:27 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j1AIUFHs008559 for ; Thu, 10 Feb 2005 11:30:16 -0700 To: Bryan Henderson In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, 2005-02-10 at 10:00, Bryan Henderson wrote: > >Don't you think, filesystems submitting biggest chunks of IO > >possible is better than submitting 1k-4k chunks and hoping that > >IO schedulers do the perfect job ? > > No, I don't see why it would better. In fact intuitively, I think the I/O > scheduler, being closer to the device, should do a better job of deciding > in what packages I/O should go to the device. After all, there exist > block devices that don't process big chunks faster than small ones. But > > So this starts to look like something where you withhold data from the I/O > scheduler in order to prevent it from scheduling the I/O wrongly because > you (the pager/filesystem driver) know better. That shouldn't be the > architecture. > > So I'd like still like to see a theory that explains why submitting the > I/O a little at a time (i.e. including the bio_submit() in the loop that > assembles the I/O) causes the device to be idle more. > > >We all learnt thro 2.4 RAW code about the overhead of doing 512bytes > >IO and making the elevator merge all the peices together. > > That was CPU time, right? In the present case, the numbers say it takes > the same amount of CPU time to assemble the I/O above the I/O scheduler as > inside it. One clear distinction between submitting smaller chunks vs larger ones is - number of call backs we get and the processing we need to do. I don't think we have enough numbers here to get to bottom of this. CPU utilization remains same in both cases, doesn't mean that - the test took exactly same amount of time. I don't even think that we are doing a fixed number of IOs. Its possible that by doing larger IOs we save CPU and use that CPU to push more data ? Thanks, Badari