From: mel@skynet.ie (Mel Gorman)
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Hellwig <hch@infradead.org>,
Christoph Lameter <clameter@sgi.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
linux-kernel@vger.kernel.org,
William Lee Irwin III <wli@holomorphy.com>,
David Chinner <dgc@sgi.com>, Jens Axboe <jens.axboe@oracle.com>,
Badari Pulavarty <pbadari@gmail.com>,
Maxim Levitsky <maximlevitsky@gmail.com>
Subject: Re: [00/17] Large Blocksize Support V3
Date: Fri, 27 Apr 2007 14:06:52 +0100 [thread overview]
Message-ID: <20070427130652.GG3645@skynet.ie> (raw)
In-Reply-To: <4631CAE7.2080109@yahoo.com.au>
On (27/04/07 20:05), Nick Piggin didst pronounce:
> Christoph Hellwig wrote:
> >On Thu, Apr 26, 2007 at 05:48:12PM +1000, Nick Piggin wrote:
> >
> >>>Well maybe you could explain what you want. Preferably without
> >>>redefining the established terms?
> >>
> >>Support for larger buffers than page cache pages.
> >
> >
> >I don't think you really want this :) The whole non-pagecache I/O
> >path before 2.3 was a toal pain just because it used buffers to drive
> >I/O. Add to that buffers bigger than a page and you add another
> >two mangnitudes of complexity. If you want to see a mess like that
> >download on of the eary XFS/Linux releases that had an I/O path
> >like that. I _really_ _really_ don't want to go there.
>
> I'm not actually suggesting to add anything like that. But I think
> larger blocks can be doable while retaining the "buffer" layer as a
> relatively simple pagecache to block translation.
>
> Anyway, I'm working on patches... they might crash and burn, but we
> might have something to talk about later.
>
>
> >Linux has a long tradition of trading a tiny bit of efficieny for
> >much cleaner code, and I'd for 100% go down Christoph's route here.
> >Then again I'd actually be rather surprised if > page buffers
> >were more efficient - you'd run into shitloads over overhead due to
> >them beeing non-contingous like calling vmap all over the place,
> >reprogramming iommus to at least make them look virtually contingous [1],
> >etc..
>
> I still think hardware should work reasonably well with 4K pages. The
> SGI io controllers and/or the Linux block layer that doesn't allow more
> than 128 sg entries is clearly suboptimal if the hardware runs twice as
> fast with 2MB submissions.
>
>
> >I also don't quite get what your problem with higher order allocations
> >are. order 1 allocations are generally just fine, and in fact
> >thread stacks are >= oder 1 on most architectures. And if the pagecache
> >uses higher order allocations that means we'll finally fix our problems
> >with them, which we have to do anyway. Workloads continue to grow and
> >with them the kernel overhead to manage them, while the pagesize for
> >many architectures is fixed. So we'll have to deal with order 1
> >and order 2 allocations better just for backing kmalloc and co.
>
> The pagecache is much bigger and often a lot more activity than these
> other things though. Also, the more things you add to higher order
> allocations, the more pressure you have.
>
> I like PAGE_SIZE pagecache, because it is reliable and really fast, if
> you need to reclaim a page it should be almost O(1).
>
>
> >Or think jumboframes for that matter.
>
> They can actually run into problems if the hardware wants contiguous
> memory.
>
> I don't know why you think the fragmentation issues are just magically
> fixed. It is hard and inefficient to reclaim larger order blocks (even
> with lumpy reclaim), and Mel's patches aren't perfect. Actually, last
> time I looked, they needed to keep at least 16MB of pages free to be
> reasonably effective (or do we just say that people with less than XMB
> of memory shouldn't be accessing these filesystems anyway?)
It'll work without adjusting the min_free_kbytes at all. The 16MB free had
better results after fragmentation stress tests but this was a few percent
of memory when allocating as huge pages as opposed to it falling apart. The
success rates were still way way higher than the vanilla kernel.
>, and I'm
> not sure if they have been tested for long term stability in the
> presence of a reasonable amount of higher order allocations.
>
I don't have a sample workload that has reasonable amount of higher order
allocations over longer period of time. When the next -mm comes out, SLUB will
be able to use high-order pages so I'll boot my machine with less memory to
pressure it more. Assuming the kernel boots on my desktop machine, I should
get some idea of what its long-term behaviour looks like.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
next prev parent reply other threads:[~2007-04-27 13:06 UTC|newest]
Thread overview: 235+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-24 22:21 [00/17] Large Blocksize Support V3 clameter
2007-04-24 22:21 ` [01/17] Remove open coded implementation of memclear_highpage flush clameter
2007-04-24 22:21 ` [02/17] Fix page allocation flags in grow_dev_page() clameter
2007-04-24 22:21 ` [03/17] Fix: find_or_create_page does not spread memory clameter
2007-04-24 22:21 ` [04/17] Free up page->private for compound pages clameter
2007-04-24 22:21 ` [05/17] More compound page features clameter
2007-04-24 22:21 ` [06/17] Fix up handling of Compound head pages clameter
2007-04-24 22:21 ` [07/17] vmstat.c: Support accounting for compound pages clameter
2007-04-24 22:21 ` [08/17] Define functions for page cache handling clameter
2007-04-24 23:00 ` Eric Dumazet
2007-04-25 6:27 ` Christoph Lameter
2007-04-24 22:21 ` [09/17] Convert PAGE_CACHE_xxx -> page_cache_xxx function calls clameter
2007-04-24 22:21 ` [10/17] Variable Order Page Cache: Add clearing and flushing function clameter
2007-04-26 7:02 ` Christoph Lameter
2007-04-26 8:14 ` David Chinner
2007-04-24 22:21 ` [11/17] Readahead support for the variable order page cache clameter
2007-04-24 22:21 ` [12/17] Variable Page Cache Size: Fix up reclaim counters clameter
2007-04-24 22:21 ` [13/17] set_blocksize: Allow to set a larger block size than PAGE_SIZE clameter
2007-04-24 22:21 ` [14/17] Add VM_BUG_ONs to check for correct page order clameter
2007-04-24 22:21 ` [15/17] ramfs: Variable order page cache support clameter
2007-04-24 22:21 ` [16/17] ext2: " clameter
2007-04-24 22:21 ` [17/17] xfs: " clameter
2007-04-25 0:46 ` [00/17] Large Blocksize Support V3 Jörn Engel
2007-04-25 0:47 ` H. Peter Anvin
2007-04-25 3:11 ` William Lee Irwin III
2007-04-25 11:35 ` Jens Axboe
2007-04-25 15:36 ` Christoph Lameter
2007-04-25 17:53 ` Jens Axboe
2007-04-25 18:03 ` Christoph Lameter
2007-04-25 18:05 ` Jens Axboe
2007-04-25 18:14 ` Christoph Lameter
2007-04-25 18:16 ` Jens Axboe
2007-04-25 13:28 ` Mel Gorman
2007-04-25 15:23 ` Christoph Lameter
2007-04-25 22:46 ` Badari Pulavarty
2007-04-26 1:14 ` David Chinner
2007-04-26 1:17 ` David Chinner
2007-04-26 4:51 ` Eric W. Biederman
2007-04-26 5:05 ` Christoph Lameter
2007-04-26 5:44 ` Eric W. Biederman
2007-04-26 6:37 ` Christoph Lameter
2007-04-26 9:16 ` Mel Gorman
2007-04-26 6:38 ` Nick Piggin
2007-04-26 6:46 ` Christoph Lameter
2007-04-26 6:57 ` Nick Piggin
2007-04-26 7:10 ` Christoph Lameter
2007-04-26 7:22 ` Nick Piggin
2007-04-26 7:34 ` Christoph Lameter
2007-04-26 7:48 ` Nick Piggin
2007-04-26 9:20 ` David Chinner
2007-04-26 13:53 ` Avi Kivity
2007-04-26 14:33 ` David Chinner
2007-04-26 14:56 ` Avi Kivity
2007-04-26 15:20 ` Nick Piggin
2007-04-26 17:42 ` Jens Axboe
2007-04-26 18:59 ` Eric W. Biederman
2007-04-26 16:07 ` Christoph Hellwig
2007-04-27 10:05 ` Nick Piggin
2007-04-27 13:06 ` Mel Gorman [this message]
2007-04-26 13:50 ` William Lee Irwin III
2007-04-26 18:09 ` Eric W. Biederman
2007-04-26 23:34 ` William Lee Irwin III
2007-04-26 7:48 ` Questions on printk and console_drivers gshan
2007-04-26 10:06 ` [00/17] Large Blocksize Support V3 Mel Gorman
2007-04-26 14:47 ` Nick Piggin
2007-04-26 15:58 ` Christoph Hellwig
2007-04-26 16:05 ` Jens Axboe
2007-04-26 16:16 ` Christoph Hellwig
2007-04-26 13:28 ` Alan Cox
2007-04-26 13:30 ` Jens Axboe
2007-04-29 14:12 ` Matt Mackall
2007-04-28 10:55 ` Pierre Ossman
2007-04-28 15:39 ` Eric W. Biederman
2007-04-26 5:37 ` Nick Piggin
2007-04-26 6:38 ` David Chinner
2007-04-26 6:50 ` Nick Piggin
2007-04-26 8:40 ` Mel Gorman
2007-04-26 8:55 ` Nick Piggin
2007-04-26 10:30 ` Mel Gorman
2007-04-26 10:54 ` Eric W. Biederman
2007-04-26 12:23 ` Mel Gorman
2007-04-26 17:58 ` Christoph Lameter
2007-04-26 18:02 ` Jens Axboe
2007-04-26 16:11 ` Christoph Hellwig
2007-04-26 17:49 ` Eric W. Biederman
2007-04-26 18:03 ` Christoph Lameter
2007-04-26 18:03 ` Jens Axboe
2007-04-26 18:09 ` Christoph Hellwig
2007-04-26 18:12 ` Jens Axboe
2007-04-26 18:24 ` Christoph Hellwig
2007-04-26 18:24 ` Jens Axboe
2007-04-26 18:28 ` Christoph Lameter
2007-04-26 18:29 ` Jens Axboe
2007-04-26 18:35 ` Christoph Lameter
2007-04-26 18:39 ` Jens Axboe
2007-04-26 19:35 ` Eric W. Biederman
2007-04-26 19:42 ` Jens Axboe
2007-04-27 4:05 ` Eric W. Biederman
2007-04-27 10:26 ` Nick Piggin
2007-04-27 13:51 ` Eric W. Biederman
2007-04-26 20:22 ` Mel Gorman
2007-04-27 0:21 ` William Lee Irwin III
2007-04-27 5:16 ` Jens Axboe
2007-04-27 10:38 ` Nick Piggin
2007-04-26 10:10 ` Eric W. Biederman
2007-04-26 13:50 ` David Chinner
2007-04-26 14:40 ` William Lee Irwin III
2007-04-26 15:38 ` Nick Piggin
2007-04-26 15:58 ` William Lee Irwin III
2007-04-27 9:46 ` Nick Piggin
2007-04-27 0:19 ` Jeremy Higdon
2007-04-26 18:07 ` Christoph Lameter
2007-04-26 18:45 ` Eric W. Biederman
2007-04-26 18:59 ` Christoph Lameter
2007-04-26 19:21 ` Eric W. Biederman
2007-04-26 6:40 ` Christoph Lameter
2007-04-26 6:53 ` Nick Piggin
2007-04-26 7:04 ` David Chinner
2007-04-26 7:07 ` Nick Piggin
2007-04-26 7:11 ` Christoph Lameter
2007-04-26 7:17 ` Nick Piggin
2007-04-26 7:28 ` Christoph Lameter
2007-04-26 7:45 ` Nick Piggin
2007-04-26 18:10 ` Christoph Lameter
2007-04-27 10:08 ` Nick Piggin
2007-04-26 7:07 ` Christoph Lameter
2007-04-26 7:15 ` Nick Piggin
2007-04-26 7:22 ` Christoph Lameter
2007-04-26 7:42 ` Nick Piggin
2007-04-26 10:48 ` Mel Gorman
2007-04-26 12:37 ` Andy Whitcroft
2007-04-26 14:18 ` David Chinner
2007-04-26 15:08 ` Nick Piggin
2007-04-26 15:19 ` William Lee Irwin III
2007-04-26 15:28 ` David Chinner
2007-04-26 14:53 ` William Lee Irwin III
2007-04-26 18:16 ` Christoph Lameter
2007-04-26 18:21 ` Eric W. Biederman
2007-04-27 0:32 ` William Lee Irwin III
2007-04-27 10:22 ` Nick Piggin
2007-04-27 12:58 ` William Lee Irwin III
2007-04-27 13:06 ` Nick Piggin
2007-04-27 14:49 ` William Lee Irwin III
2007-04-26 18:13 ` Christoph Lameter
2007-04-27 10:15 ` Nick Piggin
2007-04-26 14:49 ` William Lee Irwin III
2007-04-26 18:50 ` Maxim Levitsky
2007-04-27 2:04 ` Andrew Morton
2007-04-27 2:27 ` David Chinner
2007-04-27 2:53 ` Andrew Morton
2007-04-27 3:47 ` [00/17] Large Blocksize Support V3 (mmap conceptual discussion) Christoph Lameter
2007-04-27 4:20 ` [00/17] Large Blocksize Support V3 David Chinner
2007-04-27 5:15 ` Andrew Morton
2007-04-27 5:49 ` Christoph Lameter
2007-04-27 6:55 ` Andrew Morton
2007-04-27 7:19 ` Christoph Lameter
2007-04-27 7:26 ` Andrew Morton
2007-04-27 8:37 ` David Chinner
2007-04-27 12:01 ` Christoph Lameter
2007-04-27 16:36 ` David Chinner
2007-04-27 17:34 ` David Chinner
2007-04-27 19:11 ` Andrew Morton
2007-04-28 1:43 ` Nick Piggin
2007-04-28 8:04 ` Peter Zijlstra
2007-04-28 8:22 ` Andrew Morton
2007-04-28 8:32 ` Peter Zijlstra
2007-04-28 8:55 ` Andrew Morton
2007-04-28 9:36 ` Peter Zijlstra
2007-04-28 14:09 ` William Lee Irwin III
2007-04-28 18:26 ` Andrew Morton
2007-04-28 19:19 ` William Lee Irwin III
2007-04-28 21:28 ` Andrew Morton
2007-04-28 3:17 ` David Chinner
2007-04-28 3:49 ` Christoph Lameter
2007-04-28 4:56 ` Andrew Morton
2007-04-28 5:08 ` Christoph Lameter
2007-04-28 5:36 ` Andrew Morton
2007-04-28 6:24 ` Christoph Lameter
2007-04-28 6:52 ` Andrew Morton
2007-04-30 5:30 ` Christoph Lameter
2007-04-28 9:43 ` Alan Cox
2007-04-28 9:58 ` Andrew Morton
2007-04-28 10:21 ` Alan Cox
2007-04-28 10:25 ` Andrew Morton
2007-04-28 11:29 ` Alan Cox
2007-04-28 14:37 ` William Lee Irwin III
2007-04-27 7:22 ` Christoph Lameter
2007-04-27 7:29 ` Andrew Morton
2007-04-27 7:35 ` Christoph Lameter
2007-04-27 7:43 ` Andrew Morton
2007-04-27 11:05 ` Paul Mackerras
2007-04-27 11:41 ` Nick Piggin
2007-04-27 12:12 ` Christoph Lameter
2007-04-27 12:25 ` Nick Piggin
2007-04-27 13:39 ` Christoph Hellwig
2007-04-28 2:27 ` Nick Piggin
2007-04-28 2:39 ` William Lee Irwin III
2007-04-28 2:50 ` Nick Piggin
2007-04-28 3:16 ` William Lee Irwin III
2007-04-28 8:16 ` Christoph Hellwig
2007-04-27 16:48 ` Christoph Lameter
2007-04-27 13:37 ` Christoph Hellwig
2007-04-27 12:14 ` Paul Mackerras
2007-04-27 12:36 ` Nick Piggin
2007-04-27 13:42 ` Christoph Hellwig
2007-04-27 11:58 ` Christoph Lameter
2007-04-27 13:44 ` William Lee Irwin III
2007-04-27 19:15 ` Andrew Morton
2007-04-28 2:21 ` William Lee Irwin III
2007-04-27 6:09 ` David Chinner
2007-04-27 7:04 ` Andrew Morton
2007-04-27 8:03 ` David Chinner
2007-04-27 8:48 ` Andrew Morton
2007-04-27 16:45 ` Theodore Tso
2007-05-04 13:33 ` Eric W. Biederman
2007-05-07 4:29 ` David Chinner
2007-05-07 4:48 ` Eric W. Biederman
2007-05-07 5:27 ` David Chinner
2007-05-07 6:43 ` Eric W. Biederman
2007-05-07 6:49 ` William Lee Irwin III
2007-05-07 7:06 ` William Lee Irwin III
2007-05-08 8:49 ` William Lee Irwin III
2007-05-07 16:06 ` Christoph Lameter
2007-05-07 17:29 ` William Lee Irwin III
2007-05-04 12:57 ` Eric W. Biederman
2007-05-04 13:31 ` Eric W. Biederman
2007-05-04 16:11 ` Christoph Lameter
2007-05-07 4:58 ` David Chinner
2007-05-07 6:56 ` Eric W. Biederman
2007-05-07 15:17 ` Weigert, Daniel
2007-04-27 16:55 ` Theodore Tso
2007-04-27 17:32 ` Nicholas Miell
2007-04-27 18:12 ` William Lee Irwin III
2007-04-28 16:39 ` Maxim Levitsky
2007-04-30 5:23 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070427130652.GG3645@skynet.ie \
--to=mel@skynet.ie \
--cc=clameter@sgi.com \
--cc=dgc@sgi.com \
--cc=ebiederm@xmission.com \
--cc=hch@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maximlevitsky@gmail.com \
--cc=nickpiggin@yahoo.com.au \
--cc=pbadari@gmail.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.