From: Andrea Arcangeli <andrea@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@osdl.org>,
torvalds@osdl.org, linux-kernel@vger.kernel.org,
William Lee Irwin III <wli@holomorphy.com>
Subject: Re: anon_vma RFC2
Date: Thu, 11 Mar 2004 14:56:08 +0100 [thread overview]
Message-ID: <20040311135608.GI30940@dualathlon.random> (raw)
In-Reply-To: <Pine.LNX.4.44.0403111248450.1402-100000@localhost.localdomain>
Hi Hugh,
On Thu, Mar 11, 2004 at 01:23:24PM +0000, Hugh Dickins wrote:
> Hi Andrea,
>
> On Thu, 11 Mar 2004, Andrea Arcangeli wrote:
> >
> > this is the full current status of my anon_vma work. Now fork() and all
> > the other page_add/remove_rmap in memory.c plus the paging routines
> > seems fully covered and I'm now dealing with the vma merging and the
> > anon_vma garbage collection (the latter is easy but I need to track all
> > the kmem_cache_free).
>
> I'm still making my way through all the relevant mails, and not even
> glanced at your code yet: I hope later today. But to judge by the
> length of your essay on vma merging, it strikes me that you've taken
> a wrong direction in switching from my anon mm to your anon vma.
>
> Go by vmas and you have tiresome problems as they are split and merged,
> very commonly. Plus you have the overhead of new data structure per vma.
it's more complicated because it's more finegrined and it can handle
mremap too. I mean, the additional cost of tracking the vmas payoffs
because then we've a tiny list of vma to search for every page,
otherwise with the mm-wide model we'd need to search all of the vmas in
a mm. This is quite important during swapping with tons of vmas. Note
that in my common case the page will point directly to the vma
(PageDirect(page) == 1), no find_vma or whatever needed in between.
the per-vma overhead is 12 bytes, 2 pointers for the list node and 1
pointer to the anon-vma. As said above it provides several advantages,
but you're certainly right the mm approch had no vma overhead.
I'm quite convinced the anon_vma is the optimal design, though it's not
running yet ;). However it's close to compile. the whole vma and page
layer is finished (including the vma merging). I'm now dealing with the
swapcache stuff and I'm doing it slightly differently from your
anobjrmap-2 patch (obviously I also reistantiate the PG_swapcache
bitflag but the fundamental difference is that I don't drop the
swapper_space):
static inline struct address_space * page_mapping(struct page * page)
{
extern struct address_space swapper_space;
struct address_space * mapping = NULL;
if (PageSwapCache(page))
mapping = &swapper_space;
else if (!PageAnon(page))
mapping = page->as.mapping;
return mapping;
}
I want the same pagecache/swapcache code to work transparently, but I
free up the page->index and the page->mapping for the swapcache, so that
I can reuse it to track the anon_vma. I think the above is simpler than
killing the swapper_space completely as you did. My solution avoids me
hacks like this:
if (mapping && mapping->a_ops && mapping->a_ops->sync_page)
return mapping->a_ops->sync_page(page);
+ if (PageSwapCache(page))
+ blk_run_queues();
return 0;
}
it also avoids me rework set_page_dirty to call __set_page_dirty_buffers
by hand too. I mean, it's less intrusive.
the cpu cost it's similar, since I pay for an additional compare in
page_mapping though, but the code looks cleaner. Could be my opinion
only though ;).
> If your design magicked those problems away somehow, okay, but it seems
> you're finding issues with it: I think you should go back to anon mms.
the only issue I found so far, is that to track the stuff in a
fine-granular way I have to forbid merging sometime. note that
forbidding merging is a feature too, if I would go down with a pagetable
scan on the vma to fixup all page->as.vma/anon_vma and page->index I
would then lose some historic information on the origin of certain vmas,
and I would eventually fallback to the mm-wide information if I would do
total merging.
I think the probability of forbidden merging is low enough that it
doesn't matter. Also it doesn't impact in any way the file merging.
It basically merges as well as the file merging. Right now I'm also not
overriding the intitial vm_pgoff given to brand new anonymous vmas, but
I could, to boost the merging with mremapped segments. Though I don't
think it's necessary.
Overall the main reason for forbidding keeping track of vmas and not of
mm, is to be able to handle mremap as efficiently as with 2.4, I mean
your anobjrmap-5 simply reistantiate the pte_chains, so the vm then has
to deal with both pte_chains and anonmm too.
> Go by mms, and there's only the exceedingly rare (does it ever occur
> outside our testing?) awkward case of tracking pages in a private anon
> vma inherited from parent, when parent or child mremaps it with MAYMOVE.
>
> Which I reused the pte_chain code for, but it's probably better done
> by conjuring up an imaginary tmpfs object as backing at that point
> (that has its own little cost, since the object lives on at full size
> until all its mappers unmap it, however small the portion they have
> mapped). And the overhead of the new data structre is per mm only.
>
> I'll get back to reading through the mails now: sorry if I'm about to
> find the arguments against anonmm in my reading. (By the way, several
> times you mention the size of a 2.6 struct page as larger than a 2.4
> struct page: no, thanks to wli and others it's the 2.6 that's smaller.)
really? mainline 2.6 has the same size of mainline 2.4 (48 bytes), or
I'm counting wrong? (at least my 2.4-aa tree is 48 bytes too, but I
think 2.4 mainline too) objrmap adds 4 bytes (goes to 52bytes), my patch
removes 8 bytes (i.e. the pte_chain) and the result of my patch is 4
bytes less than 2.4 and 2.6 (44 bytes instead of 48 bytes). I wanted to
nuke the mapcount too but that destroy the nr_mapped info, and that
spreads all over so for now I keep the page->mapcount ;)
next prev parent reply other threads:[~2004-03-11 13:55 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-08 20:24 objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Andrea Arcangeli
2004-03-08 20:39 ` Linus Torvalds
2004-03-08 21:23 ` Andrew Morton
2004-03-08 23:02 ` Andrea Arcangeli
2004-03-08 23:21 ` Andrew Morton
2004-03-08 23:40 ` Andrea Arcangeli
2004-03-09 0:10 ` Andrew Morton
2004-03-09 0:35 ` Andrea Arcangeli
2004-03-09 0:59 ` Andrew Morton
2004-03-09 8:31 ` Ingo Molnar
2004-03-09 8:44 ` William Lee Irwin III
2004-03-09 9:03 ` Ingo Molnar
2004-03-09 14:51 ` Andrea Arcangeli
2004-03-09 15:09 ` Ingo Molnar
2004-03-09 15:24 ` Andrea Arcangeli
2004-03-09 16:10 ` Ingo Molnar
2004-03-09 16:35 ` Andrea Arcangeli
2004-03-08 21:02 ` Andrew Morton
2004-03-08 22:34 ` Andrea Arcangeli
2004-03-09 2:46 ` Andrew Morton
2004-03-08 21:28 ` Arjan van de Ven
2004-03-08 23:08 ` Andrea Arcangeli
2004-03-09 7:47 ` Ingo Molnar
2004-03-09 15:21 ` Andrea Arcangeli
2004-03-09 15:36 ` Ingo Molnar
2004-03-09 16:33 ` Andrea Arcangeli
2004-03-09 17:23 ` Martin J. Bligh
2004-03-09 19:57 ` Ingo Molnar
2004-03-09 20:27 ` Andrea Arcangeli
2004-03-10 11:35 ` Ingo Molnar
2004-03-10 12:32 ` Andrea Arcangeli
2004-03-09 10:52 ` [lockup] " Ingo Molnar
2004-03-09 11:02 ` Ingo Molnar
2004-03-09 11:09 ` Andrew Morton
2004-03-09 11:49 ` Ingo Molnar
2004-03-09 12:32 ` William Lee Irwin III
2004-03-09 16:03 ` Andrea Arcangeli
2004-03-10 10:36 ` RFC anon_vma previous (i.e. full objrmap) Andrea Arcangeli
2004-03-10 10:40 ` RFC anon_vma preview " Andrea Arcangeli
2004-03-10 10:54 ` RFC anon_vma previous " Ingo Molnar
2004-03-11 6:52 ` anon_vma RFC2 Andrea Arcangeli
2004-03-11 13:23 ` Hugh Dickins
2004-03-11 13:56 ` Andrea Arcangeli [this message]
2004-03-11 21:54 ` Hugh Dickins
2004-03-12 1:47 ` Andrea Arcangeli
2004-03-12 2:20 ` Andrea Arcangeli
2004-03-12 3:28 ` Rik van Riel
2004-03-12 12:21 ` Andrea Arcangeli
2004-03-12 12:40 ` Rik van Riel
2004-03-12 13:11 ` Andrea Arcangeli
2004-03-12 16:25 ` Rik van Riel
2004-03-12 17:13 ` Andrea Arcangeli
2004-03-12 17:23 ` Rik van Riel
2004-03-12 17:44 ` Andrea Arcangeli
2004-03-12 18:18 ` Rik van Riel
2004-03-12 18:25 ` Linus Torvalds
2004-03-12 18:48 ` Rik van Riel
2004-03-12 19:02 ` Chris Friesen
2004-03-12 19:06 ` Rik van Riel
2004-03-12 19:10 ` Chris Friesen
2004-03-12 19:14 ` Rik van Riel
2004-03-12 20:27 ` Andrea Arcangeli
2004-03-12 20:32 ` Rik van Riel
2004-03-12 20:49 ` Andrea Arcangeli
2004-03-12 21:08 ` Jamie Lokier
2004-03-12 12:42 ` Andrea Arcangeli
2004-03-12 12:46 ` William Lee Irwin III
2004-03-12 13:24 ` Andrea Arcangeli
2004-03-12 13:40 ` William Lee Irwin III
2004-03-12 13:55 ` Hugh Dickins
2004-03-12 16:01 ` Andrea Arcangeli
2004-03-12 16:17 ` Linus Torvalds
2004-03-13 0:28 ` William Lee Irwin III
2004-03-13 14:43 ` Rik van Riel
2004-03-13 16:18 ` Linus Torvalds
2004-03-13 17:24 ` Hugh Dickins
2004-03-13 17:28 ` Rik van Riel
2004-03-13 17:41 ` Hugh Dickins
2004-03-13 18:08 ` Andrea Arcangeli
2004-03-13 17:54 ` Andrea Arcangeli
2004-03-13 17:55 ` Andrea Arcangeli
2004-03-13 18:57 ` Linus Torvalds
2004-03-13 19:14 ` Hugh Dickins
2004-03-13 17:48 ` Andrea Arcangeli
2004-03-13 17:33 ` Andrea Arcangeli
2004-03-13 17:53 ` Hugh Dickins
2004-03-13 18:13 ` Andrea Arcangeli
2004-03-13 19:35 ` Hugh Dickins
2004-03-13 17:57 ` Rik van Riel
2004-03-12 13:43 ` Hugh Dickins
2004-03-12 15:56 ` Andrea Arcangeli
2004-03-12 16:12 ` Hugh Dickins
2004-03-12 16:39 ` Andrea Arcangeli
2004-03-11 17:33 ` Andrea Arcangeli
2004-03-11 22:20 ` Rik van Riel
2004-03-11 23:43 ` Hugh Dickins
2004-03-12 3:20 ` Rik van Riel
2004-03-09 17:22 ` [lockup] Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Rik van Riel
2004-03-09 17:56 ` Andrea Arcangeli
2004-03-09 15:59 ` Andrea Arcangeli
2004-03-09 16:07 ` Ingo Molnar
2004-03-09 16:08 ` Ingo Molnar
2004-03-09 16:39 ` Andrea Arcangeli
2004-03-09 19:33 ` Ingo Molnar
2004-03-09 16:39 ` Andrea Arcangeli
2004-03-09 15:41 ` Andrea Arcangeli
2004-03-15 19:47 ` Marcelo Tosatti
2004-03-15 22:00 ` Andrea Arcangeli
2004-03-16 7:39 ` Marcelo Tosatti
2004-03-16 13:50 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2004-03-11 20:09 anon_vma RFC2 Manfred Spraul
[not found] <20040310080000.GA30940@dualathlon.random>
2004-03-10 13:01 ` [lockup] Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Rik van Riel
2004-03-10 13:50 ` Andrea Arcangeli
2004-03-12 17:05 ` anon_vma RFC2 Rajesh Venkatasubramanian
2004-03-12 17:26 ` Andrea Arcangeli
2004-03-12 21:16 ` Rajesh Venkatasubramanian
2004-03-13 17:55 ` Rajesh Venkatasubramanian
2004-03-13 18:16 ` Andrea Arcangeli
2004-03-13 19:40 ` Rajesh Venkatasubramanian
2004-03-14 0:23 ` Andrea Arcangeli
2004-03-14 0:52 ` Linus Torvalds
2004-03-14 1:01 ` William Lee Irwin III
2004-03-14 1:07 ` Rik van Riel
2004-03-14 1:19 ` William Lee Irwin III
2004-03-14 1:41 ` Rik van Riel
2004-03-14 2:27 ` William Lee Irwin III
2004-03-14 1:15 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040311135608.GI30940@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=torvalds@osdl.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox