From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: torvalds@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines)
Date: Tue, 9 Mar 2004 01:35:50 +0100 [thread overview]
Message-ID: <20040309003550.GL12612@dualathlon.random> (raw)
In-Reply-To: <20040308161046.04270108.akpm@osdl.org>
On Mon, Mar 08, 2004 at 04:10:46PM -0800, Andrew Morton wrote:
> Andrea Arcangeli <andrea@suse.de> wrote:
> >
> > > btw, mincore() has always been broken with nonlinear vma's. If you could
> > > fix that up some time using that pagetable walker it would be nice. It's
> > > not very important though.
> >
> > Ok! I'm still late at this though, I wish I would be working on the
> > nonlinear stuff by now ;), I'm still stuck at the anon_vma_chain...
>
> As I say, broken mincore() on nonlinear mappings isn't a showstopper ;)
>
> > If I understand well, vmtruncate will also need the pagetable walker to
> > nuke all mappings of the last pages of the files before we free them
> > from the pagecache. So it should be a library call that mincore can use
> > too then, I don't see problems.
>
> If we want to bother with the traditional truncate-causes-SIGBUS semantics
> on nonlinear mappings, yes. I guess it would be best to do that if
> possible.
yes, this was my object.
btw, this reminds me another trouble, that is what to do in the case
where in 2.4 we convert file-mapped-pages into anonymous pages while
they're still mapped (I don't remeber exactly what could do that but it
could happen, do you remember the details? I think this is the case that
Hugh calls the Morton pages, he also had troubles in his anobjrmap
attempt but I think it was more a fixme comment). In 2.4 the swap_out
had to deal with that somehow, but with my anobjrmap the vm will now
lose track of those pages, so they will become unswappable. Not sure if
they were unswappable in 2.4 too and/or if 2.6-rmap could leave them
visible to the vm or not.
Also these pages should be swapped to the swap device, if something,
they lost reference of the inode.
Input on the Morton pages is appreciated ;)
> > btw (for completeness), about the cpu consumption concerns about objrmap
> > w.r.t. security (that was Ingo's only argument against objrmap),
> > whatever malicious waste of cpu that could happen during paging, can be
> > already triggered in any kernel out there by using truncate on the same
> > mappings instead of swapping them out.
>
> Yes, malicious apps can DoS the machine in many ways. I'm more concerned
> about non-malicious ones getting hurt by the new search activity. Say, a
> single-threaded app which uses a huge number of vma's to map discontiguous
> parts of a file. The 2.4-style virtual scan would handle that OK, and the
> 2.6-style pte_chain walk would handle it OK too. People do weird things.
That's the db normal scenario and it's running fine on 16G boxes today
in 2.4 with objrmap. Note that we're talking about swapping here, and
compared to swap_out that cpu load in the vma chains is little compared
to throwing away hundred gigs of address space before swapping the first
4k (infact objrmap was the 1st showstopper fix to make huge-shm-swap work
properly). So I'm not very concerned. Also for some db those pages
should be mlocked, so to be optimal we should remove them from the lru
while they're mlocked as Martin suggested. and normal pure db (not
applications) don't swap, so they will only run faster.
Longer term on 64bit those weird setups will be all but common as far as
I can tell.
Overall it sounds the best trade-off.
> (objrmap could perhaps terminate the vma walk after it sees the page->count
> fall to a value which means there are no more pte's mapping the page - that
> would halve the search cost on average).
It really should! Agreed. Feel free to go ahead and fix it. I checked
the page_count before starting the loop in my 2.4 implementation, but I
forgot to do that during the core of the loop. I was only breaking the
loop at the first pte_young I would find (my objrmap isn't capable of
clearing the pte_young bit, I leave that task to the pagetable walk, you
know my 2.4 objrmap is an hybrid between objrmap and the swap_out loop,
2.6 handles all this differently).
next prev parent reply other threads:[~2004-03-09 0:35 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-08 20:24 objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Andrea Arcangeli
2004-03-08 20:39 ` Linus Torvalds
2004-03-08 21:23 ` Andrew Morton
2004-03-08 23:02 ` Andrea Arcangeli
2004-03-08 23:21 ` Andrew Morton
2004-03-08 23:40 ` Andrea Arcangeli
2004-03-09 0:10 ` Andrew Morton
2004-03-09 0:35 ` Andrea Arcangeli [this message]
2004-03-09 0:59 ` Andrew Morton
2004-03-09 8:31 ` Ingo Molnar
2004-03-09 8:44 ` William Lee Irwin III
2004-03-09 9:03 ` Ingo Molnar
2004-03-09 14:51 ` Andrea Arcangeli
2004-03-09 15:09 ` Ingo Molnar
2004-03-09 15:24 ` Andrea Arcangeli
2004-03-09 16:10 ` Ingo Molnar
2004-03-09 16:35 ` Andrea Arcangeli
2004-03-08 21:02 ` Andrew Morton
2004-03-08 22:34 ` Andrea Arcangeli
2004-03-09 2:46 ` Andrew Morton
2004-03-08 21:28 ` Arjan van de Ven
2004-03-08 23:08 ` Andrea Arcangeli
2004-03-09 7:47 ` Ingo Molnar
2004-03-09 15:21 ` Andrea Arcangeli
2004-03-09 15:36 ` Ingo Molnar
2004-03-09 16:33 ` Andrea Arcangeli
2004-03-09 17:23 ` Martin J. Bligh
2004-03-09 19:57 ` Ingo Molnar
2004-03-09 20:27 ` Andrea Arcangeli
2004-03-10 11:35 ` Ingo Molnar
2004-03-10 12:32 ` Andrea Arcangeli
2004-03-09 10:52 ` [lockup] " Ingo Molnar
2004-03-09 11:02 ` Ingo Molnar
2004-03-09 11:09 ` Andrew Morton
2004-03-09 11:49 ` Ingo Molnar
2004-03-09 12:32 ` William Lee Irwin III
2004-03-09 16:03 ` Andrea Arcangeli
2004-03-10 10:36 ` RFC anon_vma previous (i.e. full objrmap) Andrea Arcangeli
2004-03-10 10:40 ` RFC anon_vma preview " Andrea Arcangeli
2004-03-10 10:54 ` RFC anon_vma previous " Ingo Molnar
2004-03-11 6:52 ` anon_vma RFC2 Andrea Arcangeli
2004-03-11 13:23 ` Hugh Dickins
2004-03-11 13:56 ` Andrea Arcangeli
2004-03-11 21:54 ` Hugh Dickins
2004-03-12 1:47 ` Andrea Arcangeli
2004-03-12 2:20 ` Andrea Arcangeli
2004-03-12 3:28 ` Rik van Riel
2004-03-12 12:21 ` Andrea Arcangeli
2004-03-12 12:40 ` Rik van Riel
2004-03-12 13:11 ` Andrea Arcangeli
2004-03-12 16:25 ` Rik van Riel
2004-03-12 17:13 ` Andrea Arcangeli
2004-03-12 17:23 ` Rik van Riel
2004-03-12 17:44 ` Andrea Arcangeli
2004-03-12 18:18 ` Rik van Riel
2004-03-12 18:25 ` Linus Torvalds
2004-03-12 18:48 ` Rik van Riel
2004-03-12 19:02 ` Chris Friesen
2004-03-12 19:06 ` Rik van Riel
2004-03-12 19:10 ` Chris Friesen
2004-03-12 19:14 ` Rik van Riel
2004-03-12 20:27 ` Andrea Arcangeli
2004-03-12 20:32 ` Rik van Riel
2004-03-12 20:49 ` Andrea Arcangeli
2004-03-12 21:08 ` Jamie Lokier
2004-03-12 12:42 ` Andrea Arcangeli
2004-03-12 12:46 ` William Lee Irwin III
2004-03-12 13:24 ` Andrea Arcangeli
2004-03-12 13:40 ` William Lee Irwin III
2004-03-12 13:55 ` Hugh Dickins
2004-03-12 16:01 ` Andrea Arcangeli
2004-03-12 16:17 ` Linus Torvalds
2004-03-13 0:28 ` William Lee Irwin III
2004-03-13 14:43 ` Rik van Riel
2004-03-13 16:18 ` Linus Torvalds
2004-03-13 17:24 ` Hugh Dickins
2004-03-13 17:28 ` Rik van Riel
2004-03-13 17:41 ` Hugh Dickins
2004-03-13 18:08 ` Andrea Arcangeli
2004-03-13 17:54 ` Andrea Arcangeli
2004-03-13 17:55 ` Andrea Arcangeli
2004-03-13 18:57 ` Linus Torvalds
2004-03-13 19:14 ` Hugh Dickins
2004-03-13 17:48 ` Andrea Arcangeli
2004-03-13 17:33 ` Andrea Arcangeli
2004-03-13 17:53 ` Hugh Dickins
2004-03-13 18:13 ` Andrea Arcangeli
2004-03-13 19:35 ` Hugh Dickins
2004-03-13 17:57 ` Rik van Riel
2004-03-12 13:43 ` Hugh Dickins
2004-03-12 15:56 ` Andrea Arcangeli
2004-03-12 16:12 ` Hugh Dickins
2004-03-12 16:39 ` Andrea Arcangeli
2004-03-11 17:33 ` Andrea Arcangeli
2004-03-11 22:20 ` Rik van Riel
2004-03-11 23:43 ` Hugh Dickins
2004-03-12 3:20 ` Rik van Riel
2004-03-09 17:22 ` [lockup] Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Rik van Riel
2004-03-09 17:56 ` Andrea Arcangeli
2004-03-09 15:59 ` Andrea Arcangeli
2004-03-09 16:07 ` Ingo Molnar
2004-03-09 16:08 ` Ingo Molnar
2004-03-09 16:39 ` Andrea Arcangeli
2004-03-09 19:33 ` Ingo Molnar
2004-03-09 16:39 ` Andrea Arcangeli
2004-03-09 15:41 ` Andrea Arcangeli
2004-03-15 19:47 ` Marcelo Tosatti
2004-03-15 22:00 ` Andrea Arcangeli
2004-03-16 7:39 ` Marcelo Tosatti
2004-03-16 13:50 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2004-03-09 17:40 Bond, Andrew
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040309003550.GL12612@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox