From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Mel Gorman <mel@skynet.ie>
Cc: Christoph Lameter <clameter@sgi.com>,
andrea@suse.de, torvalds@linux-foundation.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Christoph Hellwig <hch@lst.de>,
William Lee Irwin III <wli@holomorphy.com>,
David Chinner <dgc@sgi.com>, Jens Axboe <jens.axboe@oracle.com>,
Badari Pulavarty <pbadari@gmail.com>,
Maxim Levitsky <maximlevitsky@gmail.com>,
Fengguang Wu <fengguang.wu@gmail.com>,
swin wang <wangswin@gmail.com>,
totty.lu@gmail.com, hugh@veritas.com, joern@lazybastard.org
Subject: Re: [00/41] Large Blocksize Support V7 (adds memmap support)
Date: Tue, 11 Sep 2007 16:00:17 +1000 [thread overview]
Message-ID: <200709111600.18756.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20070911205350.GA18127@skynet.ie>
On Wednesday 12 September 2007 06:53, Mel Gorman wrote:
> On (11/09/07 11:44), Nick Piggin didst pronounce:
> However, this discussion belongs more with the non-existant-remove-slab
> patch. Based on what we've seen since the summits, we need a thorough
> analysis with benchmarks before making a final decision (kernbench, ebizzy,
> tbench (netpipe if someone has the time/resources), hackbench and maybe
> sysbench as well as something the filesystem people recommend to get good
> coverage of the subsystems).
True. Aside, it seems I might have been mistaken in saying Christoph
is proposing to use higher order allocations to fix the SLUB regression.
Anyway, I agree let's not get sidetracked about this here.
> I'd rather not get side-tracked here. I regret you feel stream-rolled but I
> think grouping pages by mobility is the right thing to do for better usage
> of the TLB by the kernel and for improving hugepage support in userspace
> minimally. We never really did see eye-to-eye but this way, if I'm wrong
> you get to chuck eggs down the line.
No it's a fair point, and even the hugepage allocations alone are a fair
point. From the discussions I think it seems like quite probably the right
thing to do pragmatically, which is what Linux is about and I hope will
result in a better kernel in the end. So I don't have complaints except
from little ivory tower ;)
> > Sure. And some people run workloads where fragmentation is likely never
> > going to be a problem, they are shipping this poorly configured hardware
> > now or soon, so they don't have too much interest in doing it right at
> > this point, rather than doing it *now*. OK, that's a valid reason which
> > is why I don't use the argument that we should do it correctly or never
> > at all.
>
> So are we saying the right thing to do is go with fs-block from day 1 once
> we get it to optimistically use high-order pages? I think your concern
> might be that if this goes in then it'll be harder to justify fsblock in
> the future because it'll be solving a theoritical problem that takes months
> to trigger if at all. i.e. The filesystem people will push because
> apparently large block support as it is solves world peace. Is that
> accurate?
Heh. It's hard to say. I think fsblock could take a while to implement,
regardless of high order pages or not. I actually would like to be able
to pass down a mandate to say higher order pagecache will never
get merged, simply so that these talented people would work on
fsblock ;)
But that's not my place to say, and I'm actually not arguing that high
order pagecache does not have uses (especially as a practical,
shorter-term solution which is unintrusive to filesystems).
So no, I don't think I'm really going against the basics of what we agreed
in Cambridge. But it sounds like it's still being billed as first-order
support right off the bat here.
> > OTOH, I'm not sure how much buy-in there was from the filesystems guys.
> > Particularly Christoph H and XFS (which is strange because they already
> > do vmapping in places).
>
> I think they use vmapping because they have to, not because they want
> to. They might be a lot happier with fsblock if it used contiguous pages
> for large blocks whenever possible - I don't know for sure. The metadata
> accessors they might be unhappy with because it's inconvenient but as
> Christoph Hellwig pointed out at VM/FS, the filesystems who really care
> will convert.
Sure, they would rather not to. But there are also a lot of ways you can
improve vmap more than what XFS does (or probably what darwin does)
(more persistence for cached objects, and batched invalidates for example).
There are also a lot of trivial things you can do to make a lot of those
accesses not require vmaps (and less trivial things, but even such things
as binary searches over multiple pages should be quite possible with a bit
of logic).
> > It would be interesting to craft an attack. If you knew roughly the
> > layout and size of your dentry slab for example... maybe you could stat a
> > whole lot of files, then open one and keep it open (maybe post the fd to
> > a unix socket or something crazy!) when you think you have filled up a
> > couple of MB worth of them.
>
> I might regret saying this, but it would be easier to craft an attack
> using pagetable pages. It's woefully difficult to do but it's probably
> doable. I say pagetables because while slub targetted reclaim is on the
> cards and memory compaction exists for page cache pages, pagetables are
> currently pinned with no prototype patch existing to deal with them.
But even so, you can just hold an open fd in order to pin the dentry you
want. My attack would go like this: get the page size and allocation group
size for the machine, then get the number of dentries required to fill a
slab. Then read in that many dentries and pin one of them. Repeat the
process. Even if there is other activity on the system, it seems possible
that such a thing will cause some headaches after not too long a time.
Some sources of pinned memory are going to be better than others for
this of course, so yeah maybe pagetables will be a bit easier (I don't know).
> > Then I would love to say #2 will go ahead (and I hope it would), but I
> > can't force it down the throat of the filesystem maintainers just like I
> > feel they can't force vm devs (me) to do a virtually mapped and
> > defrag-able kernel :) Basically I'm trying to practice what I preach and
> > I don't want to force fsblock onto anyone.
>
> If the FS people really want it and they insist that this has to be a
> #1 citizen then it's fsblock or make something new up.
Well I'm glad you agree :) I think not all do, but as you say maybe the
only thing is just to leave it up to the individual filesystems...
next prev parent reply other threads:[~2007-09-11 21:42 UTC|newest]
Thread overview: 206+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-11 6:03 [00/41] Large Blocksize Support V7 (adds memmap support) Christoph Lameter
2007-09-10 18:52 ` Nick Piggin
2007-09-11 12:05 ` Andrea Arcangeli
2007-09-11 20:03 ` Christoph Lameter
2007-09-11 12:12 ` Jörn Engel
2007-09-11 12:12 ` Jörn Engel
2007-09-10 21:13 ` Nick Piggin
2007-09-10 21:13 ` Nick Piggin
2007-09-11 16:02 ` Goswin von Brederlow
2007-09-11 20:07 ` Christoph Lameter
2007-09-11 20:29 ` Jörn Engel
2007-09-11 20:29 ` Jörn Engel
2007-09-11 20:41 ` Christoph Lameter
2007-09-11 23:26 ` Andrea Arcangeli
2007-09-12 0:04 ` Christoph Lameter
2007-09-12 8:20 ` Andrea Arcangeli
2007-09-15 8:44 ` Andrew Morton
2007-09-15 8:44 ` Andrew Morton
2007-09-15 12:14 ` Goswin von Brederlow
2007-09-15 12:14 ` Goswin von Brederlow
2007-09-15 15:51 ` Andrea Arcangeli
2007-09-15 20:14 ` Goswin von Brederlow
2007-09-15 22:30 ` Andrea Arcangeli
2007-09-16 13:54 ` Goswin von Brederlow
2007-09-16 15:08 ` Andrea Arcangeli
2007-09-16 21:08 ` Mel Gorman
2007-09-16 22:48 ` Goswin von Brederlow
2007-09-17 9:30 ` Mel Gorman
2007-09-16 17:46 ` Jörn Engel
2007-09-16 18:15 ` Linus Torvalds
2007-09-16 18:15 ` Linus Torvalds
2007-09-16 18:21 ` Jörn Engel
2007-09-16 18:21 ` Jörn Engel
2007-09-16 18:44 ` Linus Torvalds
2007-09-16 18:44 ` Linus Torvalds
2007-09-16 22:51 ` Goswin von Brederlow
2007-09-16 22:51 ` Goswin von Brederlow
2007-09-23 17:44 ` Jörn Engel
2007-09-23 17:44 ` Jörn Engel
2007-09-16 22:06 ` Goswin von Brederlow
2007-09-16 22:06 ` Goswin von Brederlow
2007-09-16 22:40 ` Jörn Engel
2007-09-16 22:40 ` Jörn Engel
2007-09-16 18:15 ` Mel Gorman
2007-09-16 18:50 ` Andrea Arcangeli
2007-09-16 20:54 ` Mel Gorman
2007-09-16 21:31 ` Andrea Arcangeli
2007-09-17 10:13 ` Mel Gorman
2007-09-23 5:50 ` Goswin von Brederlow
2007-09-16 22:56 ` Goswin von Brederlow
2007-09-18 19:31 ` Andrea Arcangeli
2007-09-23 6:56 ` Goswin von Brederlow
2007-09-24 15:39 ` Andrea Arcangeli
2007-09-16 18:13 ` Mel Gorman
2007-09-16 18:13 ` Mel Gorman
2007-09-16 9:03 ` Nick Piggin
2007-09-17 22:00 ` Christoph Lameter
2007-09-18 0:11 ` Nick Piggin
2007-09-18 20:36 ` Christoph Lameter
2007-09-18 10:00 ` Mel Gorman
2007-09-18 10:49 ` Jörn Engel
2007-09-18 10:49 ` Jörn Engel
2007-09-18 12:31 ` David Chinner
2007-09-16 21:58 ` Goswin von Brederlow
2007-09-16 21:58 ` Goswin von Brederlow
2007-09-17 10:03 ` Mel Gorman
2007-09-17 10:03 ` Mel Gorman
2007-09-23 6:22 ` Goswin von Brederlow
2007-09-24 12:32 ` Kyle Moffett
2007-09-16 17:53 ` Jörn Engel
2007-09-16 17:53 ` Jörn Engel
2007-09-16 21:31 ` Mel Gorman
2007-09-16 21:31 ` Mel Gorman
2007-09-17 22:03 ` Christoph Lameter
2007-09-11 15:36 ` Mel Gorman
2007-09-11 1:44 ` Nick Piggin
2007-09-11 20:11 ` Christoph Lameter
2007-09-11 4:53 ` Nick Piggin
2007-09-11 20:42 ` Christoph Lameter
2007-09-11 5:30 ` Nick Piggin
2007-09-11 21:41 ` Christoph Lameter
2007-09-11 6:06 ` Nick Piggin
2007-09-11 21:52 ` Christoph Lameter
2007-09-11 18:07 ` Nick Piggin
2007-09-12 23:06 ` Christoph Lameter
2007-09-13 20:51 ` Nick Piggin
2007-09-14 17:52 ` Christoph Lameter
2007-09-16 8:22 ` Nick Piggin
2007-09-17 22:05 ` Christoph Lameter
2007-09-18 0:10 ` Nick Piggin
2007-09-18 20:42 ` Christoph Lameter
2007-09-17 11:10 ` Bernd Schmidt
2007-09-17 22:10 ` Christoph Lameter
2007-09-14 16:10 ` Goswin von Brederlow
2007-09-14 17:42 ` Mel Gorman
2007-09-15 0:31 ` Goswin von Brederlow
2007-09-16 21:16 ` Mel Gorman
2007-09-16 22:38 ` Goswin von Brederlow
2007-09-17 8:57 ` Mel Gorman
2007-09-23 6:49 ` Goswin von Brederlow
2007-09-11 20:53 ` Mel Gorman
2007-09-11 6:00 ` Nick Piggin [this message]
2007-09-11 21:48 ` Christoph Lameter
2007-09-11 6:17 ` Nick Piggin
2007-09-12 0:00 ` Christoph Lameter
2007-09-12 2:46 ` Nick Piggin
2007-09-12 23:17 ` Christoph Lameter
2007-09-13 9:40 ` Mel Gorman
2007-09-14 2:38 ` Christoph Lameter
2007-09-13 21:20 ` Nick Piggin
2007-09-14 18:08 ` Christoph Lameter
2007-09-14 18:15 ` Christoph Lameter
2007-09-15 0:33 ` Goswin von Brederlow
2007-09-16 8:53 ` Nick Piggin
2007-09-17 22:21 ` Christoph Lameter
2007-09-18 1:16 ` Nick Piggin
2007-09-18 18:30 ` Linus Torvalds
2007-09-18 17:53 ` Nick Piggin
2007-09-18 19:18 ` Andrea Arcangeli
2007-09-18 19:44 ` Linus Torvalds
2007-09-19 0:58 ` Nathan Scott
2007-09-19 1:06 ` Linus Torvalds
2007-09-19 2:45 ` Nathan Scott
2007-09-19 5:09 ` David Chinner
2007-09-19 9:41 ` Alex Tomas
2007-09-19 14:04 ` Andrea Arcangeli
2007-09-20 1:38 ` David Chinner
2007-09-20 14:54 ` Andrea Arcangeli
2007-09-20 18:11 ` Christoph Lameter
2007-09-20 18:07 ` Christoph Lameter
2007-09-21 20:41 ` Hugh Dickins
2007-09-24 21:13 ` Christoph Lameter
2007-09-28 2:46 ` Nick Piggin
2007-09-19 3:41 ` Rene Herman
2007-09-19 3:50 ` Linus Torvalds
2007-09-19 4:26 ` Rene Herman
2007-09-19 4:33 ` Linus Torvalds
2007-09-19 4:56 ` Rene Herman
2007-09-11 21:54 ` Mel Gorman
2007-09-12 14:29 ` Martin J. Bligh
2007-09-12 1:49 ` David Chinner
2007-09-11 15:27 ` Nick Piggin
2007-09-13 1:49 ` David Chinner
2007-09-12 17:23 ` Nick Piggin
2007-09-13 13:03 ` David Chinner
2007-09-13 2:01 ` Nick Piggin
2007-09-13 20:48 ` Nick Piggin
2007-09-17 4:07 ` David Chinner
2007-09-16 21:13 ` Nick Piggin
2007-09-12 2:01 ` Nick Piggin
2007-09-11 21:35 ` Christoph Lameter
2007-09-11 16:47 ` Andrea Arcangeli
2007-09-11 18:31 ` Mel Gorman
2007-09-11 2:26 ` Nick Piggin
2007-09-11 18:25 ` Maxim Levitsky
2007-09-11 3:05 ` Nick Piggin
2007-09-11 21:03 ` Mel Gorman
2007-09-11 19:20 ` Andrea Arcangeli
2007-09-11 20:19 ` Jörn Engel
2007-09-11 20:19 ` Jörn Engel
2007-09-11 20:13 ` Christoph Lameter
2007-09-11 20:01 ` Christoph Lameter
2007-09-11 4:43 ` Nick Piggin
2007-09-11 5:17 ` Nick Piggin
2007-09-11 21:27 ` Mel Gorman
2007-09-11 6:03 ` [01/41] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Christoph Lameter
2007-09-11 6:03 ` [02/41] Define functions for page cache handling Christoph Lameter
2007-09-11 6:03 ` [03/41] Use page_cache_xxx functions in mm/filemap.c Christoph Lameter
2007-09-11 6:03 ` [04/41] Use page_cache_xxx in mm/page-writeback.c Christoph Lameter
2007-09-11 6:03 ` [05/41] Use page_cache_xxx in mm/truncate.c Christoph Lameter
2007-09-11 6:03 ` [06/41] Use page_cache_xxx in mm/rmap.c Christoph Lameter
2007-09-11 6:03 ` [07/41] Use page_cache_xxx in mm/filemap_xip.c Christoph Lameter
2007-09-11 6:03 ` [08/41] Use page_cache_xxx in mm/migrate.c Christoph Lameter
2007-09-11 6:03 ` [09/41] Use page_cache_xxx in fs/libfs.c Christoph Lameter
2007-09-11 6:04 ` [10/41] Use page_cache_xxx in fs/sync Christoph Lameter
2007-09-11 6:04 ` [11/41] Use page_cache_xxx in fs/buffer.c Christoph Lameter
2007-09-11 6:04 ` [12/41] Use page_cache_xxx in mm/mpage.c Christoph Lameter
2007-09-11 6:04 ` [13/41] Use page_cache_xxx in mm/fadvise.c Christoph Lameter
2007-09-11 6:04 ` [14/41] Use page_cache_xxx in fs/splice.c Christoph Lameter
2007-09-11 6:04 ` [15/41] Use page_cache_xxx in ext2 Christoph Lameter
2007-09-11 6:04 ` [16/41] Use page_cache_xxx in fs/ext3 Christoph Lameter
2007-09-11 6:04 ` [17/41] Use page_cache_xxx in fs/ext4 Christoph Lameter
2007-09-11 6:04 ` [18/41] Use page_cache_xxx in fs/reiserfs Christoph Lameter
2007-09-11 6:04 ` [19/41] Use page_cache_xxx for fs/xfs Christoph Lameter
2007-09-11 6:04 ` [20/41] Use page_cache_xxx in drivers/block/rd.c Christoph Lameter
2007-09-11 6:04 ` [21/41] compound pages: Better PageHead/PageTail handling Christoph Lameter
2007-09-11 6:04 ` [22/41] compound pages: Add new support functions Christoph Lameter
2007-09-11 6:04 ` [23/41] compound pages: vmstat support Christoph Lameter
2007-09-11 6:04 ` [24/41] compound pages: Use new compound vmstat functions in SLUB Christoph Lameter
2007-09-11 6:04 ` [25/41] compound pages: Allow use of get_page_unless_zero with compound pages Christoph Lameter
2007-09-11 6:04 ` [26/41] compound pages: Allow freeing of compound pages via pagevec Christoph Lameter
2007-09-11 6:04 ` [27/41] Large page order operations, zeroing and flushing Christoph Lameter
2007-09-11 6:04 ` [28/41] Futex: Fix PAGE SIZE assumption Christoph Lameter
2007-09-11 6:04 ` [29/41] Fix up reclaim counters Christoph Lameter
2007-09-11 6:04 ` [30/41] Add VM_BUG_ONs to check for correct page order Christoph Lameter
2007-09-11 6:04 ` [31/41] Large Blocksize: Core piece Christoph Lameter
2007-09-11 6:04 ` [32/41] Readahead changes to support large blocksize Christoph Lameter
2007-09-11 6:04 ` [33/41] Large blocksize support in ramfs Christoph Lameter
2007-09-11 6:04 ` [34/41] Large blocksize support for XFS Christoph Lameter
2007-09-11 6:04 ` [35/41] Reiserfs: Fix up mapping_set_gfp_mask() Christoph Lameter
2007-09-11 6:04 ` [36/41] 64k block size support for Ext2/3/4 Christoph Lameter
2007-09-11 6:04 ` [37/41] ext2: fix rec_len overflow for 64KB block size Christoph Lameter
2007-09-11 6:04 ` [38/41] ext3: fix rec_len overflow with " Christoph Lameter
2007-09-11 6:04 ` [39/41] ext4: fix rec_len overflow for " Christoph Lameter
2007-09-11 6:04 ` [40/41] Do not use f_mapping in simple_prepare_write() Christoph Lameter
2007-09-11 6:04 ` [41/41] Mmap support using pte PAGE_SIZE mappings Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200709111600.18756.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=andrea@suse.de \
--cc=clameter@sgi.com \
--cc=dgc@sgi.com \
--cc=fengguang.wu@gmail.com \
--cc=hch@lst.de \
--cc=hugh@veritas.com \
--cc=jens.axboe@oracle.com \
--cc=joern@lazybastard.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maximlevitsky@gmail.com \
--cc=mel@skynet.ie \
--cc=pbadari@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=totty.lu@gmail.com \
--cc=wangswin@gmail.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.