From: Andrea Arcangeli <andrea@suse.de>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <andrea@suse.de>,
Christoph Lameter <clameter@sgi.com>,
torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
Mel Gorman <mel@skynet.ie>,
William Lee Irwin III <wli@holomorphy.com>,
David Chinner <dgc@sgi.com>, Jens Axboe <jens.axboe@oracle.com>,
Badari Pulavarty <pbadari@gmail.com>,
Maxim Levitsky <maximlevitsky@gmail.com>,
Fengguang Wu <fengguang.wu@gmail.com>,
swin wang <wangswin@gmail.com>,
totty.lu@gmail.com, hugh@veritas.com, joern@lazybastard.org
Subject: Re: [00/41] Large Blocksize Support V7 (adds memmap support)
Date: Tue, 11 Sep 2007 21:20:52 +0200 [thread overview]
Message-ID: <20070911192052.GA14675@v2.random> (raw)
In-Reply-To: <1189535461.32731.75.camel@localhost>
Hi,
On Tue, Sep 11, 2007 at 07:31:01PM +0100, Mel Gorman wrote:
> Now, the worst case scenario for your patch is that a hostile process
> allocates large amount of memory and mlocks() one 4K page per 64K chunk
> (this is unlikely in practice I know). The end result is you have many
> 64KB regions that are now unusable because 4K is pinned in each of them.
Initially 4k kmalloced tails aren't going to be mapped in
userland. But let's take the kernel stack that would generate the same
problem and that is clearly going to pin the whole 64k slab/slub
page.
What I think you're missing is that for Nick's worst case to trigger
with the config_page_shift design, you would need the _whole_ ram to
be _at_least_once_ allocated completely in kernel stacks. If the whole
100% of ram wouldn't go allocated in slub as a pure kernel stack, such
a scenario could never materialize.
With the SGI design + defrag, Nick's scenario can instead happen with
only total_ram/64k kernel stacks allocated.
The the problem with the slub fragmentation isn't a new problem, it
happens in today kernels as well and at least the slab by design is
meant to _defrag_ internally. So it's practically already solved and
it provides some guarantee unlike the buddy allocator.
> If it's my fault, sorry about that. It wasn't my intention.
It's not the fault of anyone, I simply didn't push too hard towards my
agenda for the reasons I just said, but I used any given opportunity
to discuss it.
With on-topic I meant not talking about it during the other topics,
like mmap_sem or RCU with radix tree lock ;)
> heh. Well we need to come to some sort of conclusion here or this will
> go around the merri-go-round till we're all bald.
Well, I only meant I'm still free to disagree if I think there's a
better way. All SGI has provided so far is data to show that their I/O
subsystem is much faster if the data is physically contiguous in ram
(ask Linus if you want more details, or better don't ask). That's not
very interesting data for my usages and with my hardware, and I guess
it's more likely that config_page_shift will produce interesting
numbers than their patch on my possible usage cases, but we'll never
know until both are finished.
> heh, I suggested printing the warning because I knew it had this
> problem. The purpose in my mind was to see how far the design could be
> brought before fs-block had to fill in the holes.
Indeed!
> I am still failing to see what happens when there are pagetable pages,
> slab objects or mlocked 4k pages pinning the 64K pages and you need to
> allocate another 64K page for the filesystem. I *think* you deadlock in
> a similar fashion to Christoph's approach but the shape of the problem
> is different because we are dealing with internal instead of external
> fragmentation. Am I wrong?
pagetables aren't the issue. They should be still pre-allocated in
page_size chunks. The 4k entries with 64k page-size are sure not worse
than a 32byte kmalloc today, the slab by design defragments the
stuff. There's probably room for improvement in that area even without
freeing any object by just ordering the list with an rbtree (or better
an heak like CFS should also use!!) so to always allocate new slabs
from the most full partial slab, that alone would help a lot probably
(not sure if slub does anything like that, I'm not fond on slub yet).
> Yes. I just think you have a different worst case that is just as bad.
Disagree here...
> small files (you need something like Shaggy's page tail packing),
> pagetable pages, pte pages all have to be dealt with. These are the
> things I think will cause us internal fragmentaiton problems.
Also note that not all users will need to turn on the tail
packing. We're here talking about features that not all users will
need anyway.. And we're in the same boat as ppc64, no difference.
Thanks!
next prev parent reply other threads:[~2007-09-11 19:20 UTC|newest]
Thread overview: 187+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-11 6:03 [00/41] Large Blocksize Support V7 (adds memmap support) Christoph Lameter
2007-09-10 18:52 ` Nick Piggin
2007-09-11 12:05 ` Andrea Arcangeli
2007-09-11 20:03 ` Christoph Lameter
2007-09-11 12:12 ` Jörn Engel
2007-09-10 21:13 ` Nick Piggin
2007-09-11 16:02 ` Goswin von Brederlow
2007-09-11 20:07 ` Christoph Lameter
2007-09-11 20:29 ` Jörn Engel
2007-09-11 20:41 ` Christoph Lameter
2007-09-11 23:26 ` Andrea Arcangeli
2007-09-12 0:04 ` Christoph Lameter
2007-09-12 8:20 ` Andrea Arcangeli
2007-09-15 8:44 ` Andrew Morton
2007-09-15 12:14 ` Goswin von Brederlow
2007-09-15 15:51 ` Andrea Arcangeli
2007-09-15 20:14 ` Goswin von Brederlow
2007-09-15 22:30 ` Andrea Arcangeli
2007-09-16 13:54 ` Goswin von Brederlow
2007-09-16 15:08 ` Andrea Arcangeli
2007-09-16 21:08 ` Mel Gorman
2007-09-16 22:48 ` Goswin von Brederlow
2007-09-17 9:30 ` Mel Gorman
2007-09-16 17:46 ` Jörn Engel
2007-09-16 18:15 ` Linus Torvalds
2007-09-16 18:21 ` Jörn Engel
2007-09-16 18:44 ` Linus Torvalds
2007-09-16 22:51 ` Goswin von Brederlow
2007-09-23 17:44 ` Jörn Engel
2007-09-16 22:06 ` Goswin von Brederlow
2007-09-16 22:40 ` Jörn Engel
2007-09-16 18:15 ` Mel Gorman
2007-09-16 18:50 ` Andrea Arcangeli
2007-09-16 20:54 ` Mel Gorman
2007-09-16 21:31 ` Andrea Arcangeli
2007-09-17 10:13 ` Mel Gorman
2007-09-23 5:50 ` Goswin von Brederlow
2007-09-16 22:56 ` Goswin von Brederlow
2007-09-18 19:31 ` Andrea Arcangeli
2007-09-23 6:56 ` Goswin von Brederlow
2007-09-24 15:39 ` Andrea Arcangeli
2007-09-16 18:13 ` Mel Gorman
2007-09-16 9:03 ` Nick Piggin
2007-09-17 22:00 ` Christoph Lameter
2007-09-18 0:11 ` Nick Piggin
2007-09-18 20:36 ` Christoph Lameter
2007-09-18 10:00 ` Mel Gorman
2007-09-18 10:49 ` Jörn Engel
2007-09-18 12:31 ` David Chinner
2007-09-16 21:58 ` Goswin von Brederlow
2007-09-17 10:03 ` Mel Gorman
2007-09-23 6:22 ` Goswin von Brederlow
2007-09-24 12:32 ` Kyle Moffett
2007-09-16 17:53 ` Jörn Engel
2007-09-16 21:31 ` Mel Gorman
2007-09-17 22:03 ` Christoph Lameter
2007-09-11 15:36 ` Mel Gorman
2007-09-11 1:44 ` Nick Piggin
2007-09-11 20:11 ` Christoph Lameter
2007-09-11 4:53 ` Nick Piggin
2007-09-11 20:42 ` Christoph Lameter
2007-09-11 5:30 ` Nick Piggin
2007-09-11 21:41 ` Christoph Lameter
2007-09-11 6:06 ` Nick Piggin
2007-09-11 21:52 ` Christoph Lameter
2007-09-11 18:07 ` Nick Piggin
2007-09-12 23:06 ` Christoph Lameter
2007-09-13 20:51 ` Nick Piggin
2007-09-14 17:52 ` Christoph Lameter
2007-09-16 8:22 ` Nick Piggin
2007-09-17 22:05 ` Christoph Lameter
2007-09-18 0:10 ` Nick Piggin
2007-09-18 20:42 ` Christoph Lameter
2007-09-17 11:10 ` Bernd Schmidt
2007-09-17 22:10 ` Christoph Lameter
2007-09-14 16:10 ` Goswin von Brederlow
2007-09-14 17:42 ` Mel Gorman
2007-09-15 0:31 ` Goswin von Brederlow
2007-09-16 21:16 ` Mel Gorman
2007-09-16 22:38 ` Goswin von Brederlow
2007-09-17 8:57 ` Mel Gorman
2007-09-23 6:49 ` Goswin von Brederlow
2007-09-11 20:53 ` Mel Gorman
2007-09-11 6:00 ` Nick Piggin
2007-09-11 21:48 ` Christoph Lameter
2007-09-11 6:17 ` Nick Piggin
2007-09-12 0:00 ` Christoph Lameter
2007-09-12 2:46 ` Nick Piggin
2007-09-12 23:17 ` Christoph Lameter
2007-09-13 9:40 ` Mel Gorman
2007-09-14 2:38 ` Christoph Lameter
2007-09-13 21:20 ` Nick Piggin
2007-09-14 18:08 ` Christoph Lameter
2007-09-14 18:15 ` Christoph Lameter
2007-09-15 0:33 ` Goswin von Brederlow
2007-09-16 8:53 ` Nick Piggin
2007-09-17 22:21 ` Christoph Lameter
2007-09-18 1:16 ` Nick Piggin
2007-09-18 18:30 ` Linus Torvalds
2007-09-18 17:53 ` Nick Piggin
2007-09-18 19:18 ` Andrea Arcangeli
2007-09-18 19:44 ` Linus Torvalds
2007-09-19 0:58 ` Nathan Scott
2007-09-19 1:06 ` Linus Torvalds
2007-09-19 2:45 ` Nathan Scott
2007-09-19 5:09 ` David Chinner
2007-09-19 9:41 ` Alex Tomas
2007-09-19 14:04 ` Andrea Arcangeli
2007-09-20 1:38 ` David Chinner
2007-09-20 14:54 ` Andrea Arcangeli
2007-09-20 18:11 ` Christoph Lameter
2007-09-20 18:07 ` Christoph Lameter
2007-09-21 20:41 ` Hugh Dickins
2007-09-24 21:13 ` Christoph Lameter
2007-09-28 2:46 ` Nick Piggin
2007-09-19 3:41 ` Rene Herman
2007-09-19 3:50 ` Linus Torvalds
2007-09-19 4:26 ` Rene Herman
2007-09-19 4:33 ` Linus Torvalds
2007-09-19 4:56 ` Rene Herman
2007-09-11 21:54 ` Mel Gorman
2007-09-12 14:29 ` Martin J. Bligh
2007-09-12 1:49 ` David Chinner
2007-09-11 15:27 ` Nick Piggin
2007-09-13 1:49 ` David Chinner
2007-09-12 17:23 ` Nick Piggin
2007-09-13 13:03 ` David Chinner
2007-09-13 2:01 ` Nick Piggin
2007-09-13 20:48 ` Nick Piggin
2007-09-17 4:07 ` David Chinner
2007-09-16 21:13 ` Nick Piggin
2007-09-12 2:01 ` Nick Piggin
2007-09-11 21:35 ` Christoph Lameter
2007-09-11 16:47 ` Andrea Arcangeli
2007-09-11 18:31 ` Mel Gorman
2007-09-11 2:26 ` Nick Piggin
2007-09-11 18:25 ` Maxim Levitsky
2007-09-11 3:05 ` Nick Piggin
2007-09-11 21:03 ` Mel Gorman
2007-09-11 19:20 ` Andrea Arcangeli [this message]
2007-09-11 20:19 ` Jörn Engel
2007-09-11 20:13 ` Christoph Lameter
2007-09-11 20:01 ` Christoph Lameter
2007-09-11 4:43 ` Nick Piggin
2007-09-11 5:17 ` Nick Piggin
2007-09-11 21:27 ` Mel Gorman
2007-09-11 6:03 ` [01/41] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Christoph Lameter
2007-09-11 6:03 ` [02/41] Define functions for page cache handling Christoph Lameter
2007-09-11 6:03 ` [03/41] Use page_cache_xxx functions in mm/filemap.c Christoph Lameter
2007-09-11 6:03 ` [04/41] Use page_cache_xxx in mm/page-writeback.c Christoph Lameter
2007-09-11 6:03 ` [05/41] Use page_cache_xxx in mm/truncate.c Christoph Lameter
2007-09-11 6:03 ` [06/41] Use page_cache_xxx in mm/rmap.c Christoph Lameter
2007-09-11 6:03 ` [07/41] Use page_cache_xxx in mm/filemap_xip.c Christoph Lameter
2007-09-11 6:03 ` [08/41] Use page_cache_xxx in mm/migrate.c Christoph Lameter
2007-09-11 6:03 ` [09/41] Use page_cache_xxx in fs/libfs.c Christoph Lameter
2007-09-11 6:04 ` [10/41] Use page_cache_xxx in fs/sync Christoph Lameter
2007-09-11 6:04 ` [11/41] Use page_cache_xxx in fs/buffer.c Christoph Lameter
2007-09-11 6:04 ` [12/41] Use page_cache_xxx in mm/mpage.c Christoph Lameter
2007-09-11 6:04 ` [13/41] Use page_cache_xxx in mm/fadvise.c Christoph Lameter
2007-09-11 6:04 ` [14/41] Use page_cache_xxx in fs/splice.c Christoph Lameter
2007-09-11 6:04 ` [15/41] Use page_cache_xxx in ext2 Christoph Lameter
2007-09-11 6:04 ` [16/41] Use page_cache_xxx in fs/ext3 Christoph Lameter
2007-09-11 6:04 ` [17/41] Use page_cache_xxx in fs/ext4 Christoph Lameter
2007-09-11 6:04 ` [18/41] Use page_cache_xxx in fs/reiserfs Christoph Lameter
2007-09-11 6:04 ` [19/41] Use page_cache_xxx for fs/xfs Christoph Lameter
2007-09-11 6:04 ` [20/41] Use page_cache_xxx in drivers/block/rd.c Christoph Lameter
2007-09-11 6:04 ` [21/41] compound pages: Better PageHead/PageTail handling Christoph Lameter
2007-09-11 6:04 ` [22/41] compound pages: Add new support functions Christoph Lameter
2007-09-11 6:04 ` [23/41] compound pages: vmstat support Christoph Lameter
2007-09-11 6:04 ` [24/41] compound pages: Use new compound vmstat functions in SLUB Christoph Lameter
2007-09-11 6:04 ` [25/41] compound pages: Allow use of get_page_unless_zero with compound pages Christoph Lameter
2007-09-11 6:04 ` [26/41] compound pages: Allow freeing of compound pages via pagevec Christoph Lameter
2007-09-11 6:04 ` [27/41] Large page order operations, zeroing and flushing Christoph Lameter
2007-09-11 6:04 ` [28/41] Futex: Fix PAGE SIZE assumption Christoph Lameter
2007-09-11 6:04 ` [29/41] Fix up reclaim counters Christoph Lameter
2007-09-11 6:04 ` [30/41] Add VM_BUG_ONs to check for correct page order Christoph Lameter
2007-09-11 6:04 ` [31/41] Large Blocksize: Core piece Christoph Lameter
2007-09-11 6:04 ` [32/41] Readahead changes to support large blocksize Christoph Lameter
2007-09-11 6:04 ` [33/41] Large blocksize support in ramfs Christoph Lameter
2007-09-11 6:04 ` [34/41] Large blocksize support for XFS Christoph Lameter
2007-09-11 6:04 ` [35/41] Reiserfs: Fix up mapping_set_gfp_mask() Christoph Lameter
2007-09-11 6:04 ` [36/41] 64k block size support for Ext2/3/4 Christoph Lameter
2007-09-11 6:04 ` [37/41] ext2: fix rec_len overflow for 64KB block size Christoph Lameter
2007-09-11 6:04 ` [38/41] ext3: fix rec_len overflow with " Christoph Lameter
2007-09-11 6:04 ` [39/41] ext4: fix rec_len overflow for " Christoph Lameter
2007-09-11 6:04 ` [40/41] Do not use f_mapping in simple_prepare_write() Christoph Lameter
2007-09-11 6:04 ` [41/41] Mmap support using pte PAGE_SIZE mappings Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070911192052.GA14675@v2.random \
--to=andrea@suse.de \
--cc=clameter@sgi.com \
--cc=dgc@sgi.com \
--cc=fengguang.wu@gmail.com \
--cc=hch@lst.de \
--cc=hugh@veritas.com \
--cc=jens.axboe@oracle.com \
--cc=joern@lazybastard.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maximlevitsky@gmail.com \
--cc=mel@csn.ul.ie \
--cc=mel@skynet.ie \
--cc=pbadari@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=totty.lu@gmail.com \
--cc=wangswin@gmail.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).