From mboxrd@z Thu Jan  1 00:00:00 1970
From: Badari Pulavarty <pbadari@us.ibm.com>
Subject: Re: ext3 writepages ?
Date: 10 Feb 2005 10:32:05 -0800
Message-ID: <1108060325.20053.1145.camel@dyn318077bld.beaverton.ibm.com>
References: <OF61009C01.96F70B2E-ON88256FA4.006218C6-88256FA4.0062EA88@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Cc: Andreas Dilger <adilger@clusterfs.com>,
	linux-fsdevel@vger.kernel.org, Sonny Rao <sonny@burdell.org>
Received: from e32.co.us.ibm.com ([32.97.110.130]:60855 "EHLO
	e32.co.us.ibm.com") by vger.kernel.org with ESMTP id S262191AbVBJSbj
	(ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 10 Feb 2005 13:31:39 -0500
Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11])
	by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j1AIUSuA573788
	for <linux-fsdevel@vger.kernel.org>; Thu, 10 Feb 2005 13:30:31 -0500
Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168])
	by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j1AIUR3c428874
	for <linux-fsdevel@vger.kernel.org>; Thu, 10 Feb 2005 11:30:27 -0700
Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1])
	by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j1AIUFHs008559
	for <linux-fsdevel@vger.kernel.org>; Thu, 10 Feb 2005 11:30:16 -0700
To: Bryan Henderson <hbryan@us.ibm.com>
In-Reply-To: <OF61009C01.96F70B2E-ON88256FA4.006218C6-88256FA4.0062EA88@us.ibm.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

On Thu, 2005-02-10 at 10:00, Bryan Henderson wrote:
> >Don't you think, filesystems submitting biggest chunks of IO
> >possible is better than submitting 1k-4k chunks and hoping that
> >IO schedulers do the perfect job ? 
> 
> No, I don't see why it would better.  In fact intuitively, I think the I/O 
> scheduler, being closer to the device, should do a better job of deciding 
> in what packages I/O should go to the device.  After all, there exist 
> block devices that don't process big chunks faster than small ones.  But 
> 
> So this starts to look like something where you withhold data from the I/O 
> scheduler in order to prevent it from scheduling the I/O wrongly because 
> you (the pager/filesystem driver) know better.  That shouldn't be the 
> architecture.
> 
> So I'd like still like to see a theory that explains why submitting the 
> I/O a little at a time (i.e. including the bio_submit() in the loop that 
> assembles the I/O) causes the device to be idle more.
> 
> >We all learnt thro 2.4 RAW code about the overhead of doing 512bytes
> >IO and making the elevator merge all the peices together.
> 
> That was CPU time, right?  In the present case, the numbers say it takes 
> the same amount of CPU time to assemble the I/O above the I/O scheduler as 
> inside it.

One clear distinction between submitting smaller chunks vs larger
ones is - number of call backs we get and the processing we need to
do.

I don't think we have enough numbers here to get to bottom of this.
CPU utilization remains same in both cases, doesn't mean that - the
test took exactly same amount of time. I don't even think that we
are doing a fixed number of IOs. Its possible that by doing larger
IOs we save CPU and use that CPU to push more data ?


Thanks,
Badari