From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
linux-kernel@vger.kernel.org, Mel Gorman <mel@skynet.ie>,
William Lee Irwin III <wli@holomorphy.com>,
David Chinner <dgc@sgi.com>, Jens Axboe <jens.axboe@oracle.com>,
Badari Pulavarty <pbadari@gmail.com>,
Maxim Levitsky <maximlevitsky@gmail.com>
Subject: Re: [00/17] Large Blocksize Support V3
Date: Fri, 27 Apr 2007 01:08:17 +1000 [thread overview]
Message-ID: <4630C061.10309@yahoo.com.au> (raw)
In-Reply-To: <46309D16.70109@shadowen.org>
Andy Whitcroft wrote:
> Nick Piggin wrote:
>
>>I don't understand what you mean at all. A block has always been a
>>contiguous area of disk.
>
>
> Lets take Nick's definition of block being a disk based unit for the
> moment. That does not change the key contention here, that even with
> hardware specifically designed to handle 4k pages that hardware handles
> larger contigious areas more efficiently. David Chinner gives us
> figures showing major overall throughput improvements from (I assume)
> shorter scatter gather lists and better tag utilisation. I am loath to
> say we can just blame the hardware vendors for poor design.
So their controllers get double the throughput when going from 512K
(128x4K pages) to 2MB (128x16K pages) requests. Do you really think
it is to do with command processing overhead?
>>Actually, I don't know why people are so excited about being able to
>>use higher order allocations (I would rather be more excited about
>>never having to use them). But for those few places that really need
>>it, I'd rather see them use a virtually mapped kernel with proper
>>defragmentation rather than putting hacks all through the core code.
>
>
> Virtually mapping the kernel was considered pretty seriously around the
> time SPARSEMEM was being developed. However, that leads to a
> non-constant relation for converting kernel virtual addresses to
> physical ones which leads to significant complexity, not to mention
> runtime overhead.
Yeah, a page table walk (or better, a TLB hit). And yeah it will cost
a bit of performance, it always does.
> As a solution to the problem of supplying large pages from the allocator
> it seems somewhat unsatisfactory. If no significant other changes are
> made in support of large allocations, the process of defragmenting
> becomes very expensive. Requiring a stop_machine style hiatus while the
> physical copy and replace occurs for any kernel backed memory.
That would be a stupid thing to do though. All you need to do (after
you keep DMA away) is to unmap the pages.
> To put it a different way, even with such a full defragmentation scheme
> available some sort of avoidance scheme would be highly desirable to
> avoid using the very expensive deframentation underlying it.
Maybe. That doesn't change the fact that avoidance isn't a complete
solution by itself.
>>Is that a big problem? Really? You use 16K pages on your IPF systems,
>>don't you?
>
>
> To my knowledge, moving to a higher base page size has its advantages in
> TLB reach, but brings with it some pretty serious downsides. Especially
> in caching small files. Internal fragmentation in the page cache
> significantly affecting system performance. So much so that development
> is ongoing to see if supporting sub-base-page objects in the buffer
> cache could be beneficial.
I think 16K would be pretty reasonable (ia64 tends to use it). I guess
powerpc went to 64k either because that's what databases want or because
their TLB refills are too slow, so the internal fragmentation bites
them a lot harder.
But that was more of a side comment, because I still think io controllers
should be easily capable of operation on 4K pages. Graphics cards are,
aren't they?
--
SUSE Labs, Novell Inc.
next prev parent reply other threads:[~2007-04-26 15:08 UTC|newest]
Thread overview: 235+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-24 22:21 [00/17] Large Blocksize Support V3 clameter
2007-04-24 22:21 ` [01/17] Remove open coded implementation of memclear_highpage flush clameter
2007-04-24 22:21 ` [02/17] Fix page allocation flags in grow_dev_page() clameter
2007-04-24 22:21 ` [03/17] Fix: find_or_create_page does not spread memory clameter
2007-04-24 22:21 ` [04/17] Free up page->private for compound pages clameter
2007-04-24 22:21 ` [05/17] More compound page features clameter
2007-04-24 22:21 ` [06/17] Fix up handling of Compound head pages clameter
2007-04-24 22:21 ` [07/17] vmstat.c: Support accounting for compound pages clameter
2007-04-24 22:21 ` [08/17] Define functions for page cache handling clameter
2007-04-24 23:00 ` Eric Dumazet
2007-04-25 6:27 ` Christoph Lameter
2007-04-24 22:21 ` [09/17] Convert PAGE_CACHE_xxx -> page_cache_xxx function calls clameter
2007-04-24 22:21 ` [10/17] Variable Order Page Cache: Add clearing and flushing function clameter
2007-04-26 7:02 ` Christoph Lameter
2007-04-26 8:14 ` David Chinner
2007-04-24 22:21 ` [11/17] Readahead support for the variable order page cache clameter
2007-04-24 22:21 ` [12/17] Variable Page Cache Size: Fix up reclaim counters clameter
2007-04-24 22:21 ` [13/17] set_blocksize: Allow to set a larger block size than PAGE_SIZE clameter
2007-04-24 22:21 ` [14/17] Add VM_BUG_ONs to check for correct page order clameter
2007-04-24 22:21 ` [15/17] ramfs: Variable order page cache support clameter
2007-04-24 22:21 ` [16/17] ext2: " clameter
2007-04-24 22:21 ` [17/17] xfs: " clameter
2007-04-25 0:46 ` [00/17] Large Blocksize Support V3 Jörn Engel
2007-04-25 0:47 ` H. Peter Anvin
2007-04-25 3:11 ` William Lee Irwin III
2007-04-25 11:35 ` Jens Axboe
2007-04-25 15:36 ` Christoph Lameter
2007-04-25 17:53 ` Jens Axboe
2007-04-25 18:03 ` Christoph Lameter
2007-04-25 18:05 ` Jens Axboe
2007-04-25 18:14 ` Christoph Lameter
2007-04-25 18:16 ` Jens Axboe
2007-04-25 13:28 ` Mel Gorman
2007-04-25 15:23 ` Christoph Lameter
2007-04-25 22:46 ` Badari Pulavarty
2007-04-26 1:14 ` David Chinner
2007-04-26 1:17 ` David Chinner
2007-04-26 4:51 ` Eric W. Biederman
2007-04-26 5:05 ` Christoph Lameter
2007-04-26 5:44 ` Eric W. Biederman
2007-04-26 6:37 ` Christoph Lameter
2007-04-26 9:16 ` Mel Gorman
2007-04-26 6:38 ` Nick Piggin
2007-04-26 6:46 ` Christoph Lameter
2007-04-26 6:57 ` Nick Piggin
2007-04-26 7:10 ` Christoph Lameter
2007-04-26 7:22 ` Nick Piggin
2007-04-26 7:34 ` Christoph Lameter
2007-04-26 7:48 ` Nick Piggin
2007-04-26 9:20 ` David Chinner
2007-04-26 13:53 ` Avi Kivity
2007-04-26 14:33 ` David Chinner
2007-04-26 14:56 ` Avi Kivity
2007-04-26 15:20 ` Nick Piggin
2007-04-26 17:42 ` Jens Axboe
2007-04-26 18:59 ` Eric W. Biederman
2007-04-26 16:07 ` Christoph Hellwig
2007-04-27 10:05 ` Nick Piggin
2007-04-27 13:06 ` Mel Gorman
2007-04-26 13:50 ` William Lee Irwin III
2007-04-26 18:09 ` Eric W. Biederman
2007-04-26 23:34 ` William Lee Irwin III
2007-04-26 7:48 ` Questions on printk and console_drivers gshan
2007-04-26 10:06 ` [00/17] Large Blocksize Support V3 Mel Gorman
2007-04-26 14:47 ` Nick Piggin
2007-04-26 15:58 ` Christoph Hellwig
2007-04-26 16:05 ` Jens Axboe
2007-04-26 16:16 ` Christoph Hellwig
2007-04-26 13:28 ` Alan Cox
2007-04-26 13:30 ` Jens Axboe
2007-04-29 14:12 ` Matt Mackall
2007-04-28 10:55 ` Pierre Ossman
2007-04-28 15:39 ` Eric W. Biederman
2007-04-26 5:37 ` Nick Piggin
2007-04-26 6:38 ` David Chinner
2007-04-26 6:50 ` Nick Piggin
2007-04-26 8:40 ` Mel Gorman
2007-04-26 8:55 ` Nick Piggin
2007-04-26 10:30 ` Mel Gorman
2007-04-26 10:54 ` Eric W. Biederman
2007-04-26 12:23 ` Mel Gorman
2007-04-26 17:58 ` Christoph Lameter
2007-04-26 18:02 ` Jens Axboe
2007-04-26 16:11 ` Christoph Hellwig
2007-04-26 17:49 ` Eric W. Biederman
2007-04-26 18:03 ` Christoph Lameter
2007-04-26 18:03 ` Jens Axboe
2007-04-26 18:09 ` Christoph Hellwig
2007-04-26 18:12 ` Jens Axboe
2007-04-26 18:24 ` Christoph Hellwig
2007-04-26 18:24 ` Jens Axboe
2007-04-26 18:28 ` Christoph Lameter
2007-04-26 18:29 ` Jens Axboe
2007-04-26 18:35 ` Christoph Lameter
2007-04-26 18:39 ` Jens Axboe
2007-04-26 19:35 ` Eric W. Biederman
2007-04-26 19:42 ` Jens Axboe
2007-04-27 4:05 ` Eric W. Biederman
2007-04-27 10:26 ` Nick Piggin
2007-04-27 13:51 ` Eric W. Biederman
2007-04-26 20:22 ` Mel Gorman
2007-04-27 0:21 ` William Lee Irwin III
2007-04-27 5:16 ` Jens Axboe
2007-04-27 10:38 ` Nick Piggin
2007-04-26 10:10 ` Eric W. Biederman
2007-04-26 13:50 ` David Chinner
2007-04-26 14:40 ` William Lee Irwin III
2007-04-26 15:38 ` Nick Piggin
2007-04-26 15:58 ` William Lee Irwin III
2007-04-27 9:46 ` Nick Piggin
2007-04-27 0:19 ` Jeremy Higdon
2007-04-26 18:07 ` Christoph Lameter
2007-04-26 18:45 ` Eric W. Biederman
2007-04-26 18:59 ` Christoph Lameter
2007-04-26 19:21 ` Eric W. Biederman
2007-04-26 6:40 ` Christoph Lameter
2007-04-26 6:53 ` Nick Piggin
2007-04-26 7:04 ` David Chinner
2007-04-26 7:07 ` Nick Piggin
2007-04-26 7:11 ` Christoph Lameter
2007-04-26 7:17 ` Nick Piggin
2007-04-26 7:28 ` Christoph Lameter
2007-04-26 7:45 ` Nick Piggin
2007-04-26 18:10 ` Christoph Lameter
2007-04-27 10:08 ` Nick Piggin
2007-04-26 7:07 ` Christoph Lameter
2007-04-26 7:15 ` Nick Piggin
2007-04-26 7:22 ` Christoph Lameter
2007-04-26 7:42 ` Nick Piggin
2007-04-26 10:48 ` Mel Gorman
2007-04-26 12:37 ` Andy Whitcroft
2007-04-26 14:18 ` David Chinner
2007-04-26 15:08 ` Nick Piggin [this message]
2007-04-26 15:19 ` William Lee Irwin III
2007-04-26 15:28 ` David Chinner
2007-04-26 14:53 ` William Lee Irwin III
2007-04-26 18:16 ` Christoph Lameter
2007-04-26 18:21 ` Eric W. Biederman
2007-04-27 0:32 ` William Lee Irwin III
2007-04-27 10:22 ` Nick Piggin
2007-04-27 12:58 ` William Lee Irwin III
2007-04-27 13:06 ` Nick Piggin
2007-04-27 14:49 ` William Lee Irwin III
2007-04-26 18:13 ` Christoph Lameter
2007-04-27 10:15 ` Nick Piggin
2007-04-26 14:49 ` William Lee Irwin III
2007-04-26 18:50 ` Maxim Levitsky
2007-04-27 2:04 ` Andrew Morton
2007-04-27 2:27 ` David Chinner
2007-04-27 2:53 ` Andrew Morton
2007-04-27 3:47 ` [00/17] Large Blocksize Support V3 (mmap conceptual discussion) Christoph Lameter
2007-04-27 4:20 ` [00/17] Large Blocksize Support V3 David Chinner
2007-04-27 5:15 ` Andrew Morton
2007-04-27 5:49 ` Christoph Lameter
2007-04-27 6:55 ` Andrew Morton
2007-04-27 7:19 ` Christoph Lameter
2007-04-27 7:26 ` Andrew Morton
2007-04-27 8:37 ` David Chinner
2007-04-27 12:01 ` Christoph Lameter
2007-04-27 16:36 ` David Chinner
2007-04-27 17:34 ` David Chinner
2007-04-27 19:11 ` Andrew Morton
2007-04-28 1:43 ` Nick Piggin
2007-04-28 8:04 ` Peter Zijlstra
2007-04-28 8:22 ` Andrew Morton
2007-04-28 8:32 ` Peter Zijlstra
2007-04-28 8:55 ` Andrew Morton
2007-04-28 9:36 ` Peter Zijlstra
2007-04-28 14:09 ` William Lee Irwin III
2007-04-28 18:26 ` Andrew Morton
2007-04-28 19:19 ` William Lee Irwin III
2007-04-28 21:28 ` Andrew Morton
2007-04-28 3:17 ` David Chinner
2007-04-28 3:49 ` Christoph Lameter
2007-04-28 4:56 ` Andrew Morton
2007-04-28 5:08 ` Christoph Lameter
2007-04-28 5:36 ` Andrew Morton
2007-04-28 6:24 ` Christoph Lameter
2007-04-28 6:52 ` Andrew Morton
2007-04-30 5:30 ` Christoph Lameter
2007-04-28 9:43 ` Alan Cox
2007-04-28 9:58 ` Andrew Morton
2007-04-28 10:21 ` Alan Cox
2007-04-28 10:25 ` Andrew Morton
2007-04-28 11:29 ` Alan Cox
2007-04-28 14:37 ` William Lee Irwin III
2007-04-27 7:22 ` Christoph Lameter
2007-04-27 7:29 ` Andrew Morton
2007-04-27 7:35 ` Christoph Lameter
2007-04-27 7:43 ` Andrew Morton
2007-04-27 11:05 ` Paul Mackerras
2007-04-27 11:41 ` Nick Piggin
2007-04-27 12:12 ` Christoph Lameter
2007-04-27 12:25 ` Nick Piggin
2007-04-27 13:39 ` Christoph Hellwig
2007-04-28 2:27 ` Nick Piggin
2007-04-28 2:39 ` William Lee Irwin III
2007-04-28 2:50 ` Nick Piggin
2007-04-28 3:16 ` William Lee Irwin III
2007-04-28 8:16 ` Christoph Hellwig
2007-04-27 16:48 ` Christoph Lameter
2007-04-27 13:37 ` Christoph Hellwig
2007-04-27 12:14 ` Paul Mackerras
2007-04-27 12:36 ` Nick Piggin
2007-04-27 13:42 ` Christoph Hellwig
2007-04-27 11:58 ` Christoph Lameter
2007-04-27 13:44 ` William Lee Irwin III
2007-04-27 19:15 ` Andrew Morton
2007-04-28 2:21 ` William Lee Irwin III
2007-04-27 6:09 ` David Chinner
2007-04-27 7:04 ` Andrew Morton
2007-04-27 8:03 ` David Chinner
2007-04-27 8:48 ` Andrew Morton
2007-04-27 16:45 ` Theodore Tso
2007-05-04 13:33 ` Eric W. Biederman
2007-05-07 4:29 ` David Chinner
2007-05-07 4:48 ` Eric W. Biederman
2007-05-07 5:27 ` David Chinner
2007-05-07 6:43 ` Eric W. Biederman
2007-05-07 6:49 ` William Lee Irwin III
2007-05-07 7:06 ` William Lee Irwin III
2007-05-08 8:49 ` William Lee Irwin III
2007-05-07 16:06 ` Christoph Lameter
2007-05-07 17:29 ` William Lee Irwin III
2007-05-04 12:57 ` Eric W. Biederman
2007-05-04 13:31 ` Eric W. Biederman
2007-05-04 16:11 ` Christoph Lameter
2007-05-07 4:58 ` David Chinner
2007-05-07 6:56 ` Eric W. Biederman
2007-05-07 15:17 ` Weigert, Daniel
2007-04-27 16:55 ` Theodore Tso
2007-04-27 17:32 ` Nicholas Miell
2007-04-27 18:12 ` William Lee Irwin III
2007-04-28 16:39 ` Maxim Levitsky
2007-04-30 5:23 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4630C061.10309@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=apw@shadowen.org \
--cc=clameter@sgi.com \
--cc=dgc@sgi.com \
--cc=ebiederm@xmission.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maximlevitsky@gmail.com \
--cc=mel@skynet.ie \
--cc=pbadari@gmail.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox