From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754714AbXDZFhr (ORCPT ); Thu, 26 Apr 2007 01:37:47 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754727AbXDZFhr (ORCPT ); Thu, 26 Apr 2007 01:37:47 -0400 Received: from smtp109.mail.mud.yahoo.com ([209.191.85.219]:31667 "HELO smtp109.mail.mud.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1754714AbXDZFhq (ORCPT ); Thu, 26 Apr 2007 01:37:46 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:Message-ID:Date:From:User-Agent:X-Accept-Language:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=gl2w+O5HjMyTXkm36TicpA4F0i2VtYDeIlvOuktdyJxE+cNVn5fXqkzvWzaecWTylMRa9G8DzUSm8ux90jBf0QHplli42eb4Raoh9Z3oiqFhN3kv2KKABulmkz9Gi4PGl4wkdZaeraZh89sMQz/2b2fJvdVW4mlMkGfKvBUsRQ8= ; Message-ID: <46303A98.9000605@yahoo.com.au> Date: Thu, 26 Apr 2007 15:37:28 +1000 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007 Debian/1.7.12-1 X-Accept-Language: en MIME-Version: 1.0 To: "Eric W. Biederman" CC: clameter@sgi.com, linux-kernel@vger.kernel.org, Mel Gorman , William Lee Irwin III , David Chinner , Jens Axboe , Badari Pulavarty , Maxim Levitsky Subject: Re: [00/17] Large Blocksize Support V3 References: <20070424222105.883597089@sgi.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Eric W. Biederman wrote: > clameter@sgi.com writes: > > >>V2->V3 >>- More restructuring >>- It actually works! >>- Add XFS support >>- Fix up UP support >>- Work out the direct I/O issues >>- Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert >> back to constants. Disabled for 32bit and HIGHMEM configurations. >> This also allows a gradual migration to the new page cache >> inline functions. LARGE_BLOCKSIZE capabilities can be >> added gradually and if there is a problem then we can disable >> a subsystem. >> >>V1->V2 >>- Some ext2 support >>- Some block layer, fs layer support etc. >>- Better page cache macros >>- Use macros to clean up code. >> >>This patchset modifies the Linux kernel so that larger block sizes than >>page size can be supported. Larger block sizes are handled by using >>compound pages of an arbitrary order for the page cache instead of >>single pages with order 0. > > > Huh? > > You seem to be mixing two very different concepts. > > The page cache has no problems supporting things with a block > size larger then page size. Now the block device layer may not > have the code to do the scatter gather into small pages and it > may not handle buffer heads whose data is split between multiple > pages. Yeah, this patch is not really large blocksize support (which we normally think of as block size > page cache size). > But this is not a page cache issue. > > And generally larger physical pages are a mistake to use. > Especially as it looks from some of the later comment you don't > date test on 32bit because the memory fragments faster. I actually completely agree with this, and I'm concerned in general about using higher order pages. I think it is fundamentally the wrong approach because of fragmentation and defragmentation costs (similarly to Linus's take on page colouring). I think starting with the assumption that we _want_ to use higher order allocations, and then creating all this complexity around that is not a good one, and if we start introducing things that _require_ significant higher order allocations to function then it is a nasty thing for robustness. > Is it common for hardware that supports large block sizes to not > support splitting those blocks apart during DMA? Unless it is common > the whole premise of this patchset seems broken. > > I suspect what needs to be fixed is the page cache block device > interface so that we have helper functions that know how to stuff > a single block into several pages. I am working now and again on some code to do this, it is a big job but I think it is the right way to do it. But it would take a long time to get stable and supported by filesystems... > That would make the choice of using larger order pages (essentially > increasing PAGE_SIZE) something that can be investigated in parallel. I agree that hardware inefficiencies should be handled by increasing PAGE_SIZE (not making PAGE_CACHE_SIZE > PAGE_SIZE) at the arch level. -- SUSE Labs, Novell Inc.