public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjanv@redhat.com>,
	Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines)
Date: Wed, 10 Mar 2004 13:32:50 +0100	[thread overview]
Message-ID: <20040310123250.GG30940@dualathlon.random> (raw)
In-Reply-To: <20040310113501.GA1112@elte.hu>

On Wed, Mar 10, 2004 at 12:35:01PM +0100, Ingo Molnar wrote:
> 
> * Andrea Arcangeli <andrea@suse.de> wrote:
> 
> > the quality of such objrmap patch is still better than rmap. The DoS
> > thing is doable with vmtruncate too in any kernel out there.
> 
> objrmap for now has a serious problem: test-mmap3.c locked up my box (i
> couldnt switch text consoles for 30 minutes when i turned the box off).
> 
> I'm sure you'll fix it and i'm looking forward seeing it.  However, i'd
> like to see the full fix instead of a promise to have this fixed
> sometime the future.  There are valid application workloads that trigger
> _worse_ vma patterns than test-mmap3.c does (UML being one such thing,
> Oracle with indirect buffer-cache another - i'm sure there are other
> apps too.).  Calling these applications 'exploits' doesnt help in
> getting this thing fixed.  There's no problem with keeping this patchset
> separate until it's regression-free.
> 
> > merging objrmap is the first step. Any other effort happens on top of
> > it.
> 
> i'd like to see that effort combined with this code, and the full
> picture.  Since this 'DoS property' is created by the current concept of
> the patch, it's not a 'bug' that is easily fixed so we must not (and
> cannot) sign up for it blindly, without seeing the full impact.  But
> yes, it might be fixable.  Anyway - the 2.6 kernel is a stable tree and
> i'm sure you know that avoiding regression is more important than
> anything else.

I'm fine to wait the whole work to be finished and to merge it all at
once (still from separate incremental patches) instead of merging it in
steps in mainline and your longer term confidence in our work is
promising, thanks.

since I need this fixed fast, I may have to go the rbtree way to go
safe (mainline could go with prio_trees in the long run instead).

however I still disagree the objrmap I posted is a regression for
applications like Oracle (dunno about uml). It's an obvious regression
for your test-mmap3.c and that's why I call test-mmap3.c an exploit and
not a "real app". Nobody would map 1 page per vma, get real, you have an
hard time to convince me a real app is going to scatter vma with 4k
aperture each. you wrote the very worst case that everybody is aware
about, a real app scenario would not do that. Note that there's quite an
huge amount of merging of file-vmas, you absolutely prevent that too.

Furthmore you said Oracle needs mlock to work "safe" with rmap. But with
2.6 if you use mlock it will still not work. If you use 2.6+objrmap
mlock will fix your DoS secenario too, and Oracle will work as fast as
rmap+mlock in your rmap 2.4 implementation.

Also you're advocating for the "merging in steps" and keeing "2.6
optimal", but you're ignoring the single reason you are forced to ship a
2.4 kernel with 4:4 for every >4G machine. 2.6 mainline (the current
2.6.3 step) has no way to be compiled with 4:4 model. So the current
great 2.6 kernel has no way to work with any machine >4G (if you ship
all PAE kernels with rmap compiled with 4:4 you must agree 2.6 mainline
has no way to work on any kernel with >4G of ram, so you should not be
surprised that I'm dealing with those issues currently). Is 2.6 an high
end kernel with rmap? I supported 4G on x86 the first time with bigmem
in 2.2.

Solving the problem by merging 4:4 instead of removing rmap is not the
way to go IMHO since it doesn't fix the memory waste for 64bit archs
compared to what we can do with 2.4 _mainline_ (64bit doesn't need
pte-highmem and there are no highmem issues to solve there).

At least with objrmap applied to 2.6, there would be a chance to survive
the load on >4G boxes in a 2.6 mainline kernel.  Sure, you'd better be
careful not to swapout heavy or it would risk to hang badly (if the app
isn't using mlock, if the app uses mlock 2.6 will fly), but without
objrmap it would lockup before you can worry about reaching swap (mlock
or not).

So in practice I think it would been ok to merge objrmap as an
intermediate step (it's not that I didn't evaluate those possibilities
when I submitted it).

As for the DoS thing in security terms, truncate has the same issue. It
maybe easier to kill the "exploit" since it returns to userspace every
time, and userspace is not swapped out when it happens, but it would
still waste an indefinite amount of time in kernel space. So providing
an efficient means of the i_mmap vma lookup is a problem irrelevant to
the objrmap patch for the vm, I think we agree on this. Doing that will
fix all users (so the vm too).

  reply	other threads:[~2004-03-10 12:32 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-08 20:24 objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Andrea Arcangeli
2004-03-08 20:39 ` Linus Torvalds
2004-03-08 21:23   ` Andrew Morton
2004-03-08 23:02     ` Andrea Arcangeli
2004-03-08 23:21       ` Andrew Morton
2004-03-08 23:40         ` Andrea Arcangeli
2004-03-09  0:10           ` Andrew Morton
2004-03-09  0:35             ` Andrea Arcangeli
2004-03-09  0:59               ` Andrew Morton
2004-03-09  8:31           ` Ingo Molnar
2004-03-09  8:44             ` William Lee Irwin III
2004-03-09  9:03             ` Ingo Molnar
2004-03-09 14:51               ` Andrea Arcangeli
2004-03-09 15:09                 ` Ingo Molnar
2004-03-09 15:24                   ` Andrea Arcangeli
2004-03-09 16:10                     ` Ingo Molnar
2004-03-09 16:35                       ` Andrea Arcangeli
2004-03-08 21:02 ` Andrew Morton
2004-03-08 22:34   ` Andrea Arcangeli
2004-03-09  2:46     ` Andrew Morton
2004-03-08 21:28 ` Arjan van de Ven
2004-03-08 23:08   ` Andrea Arcangeli
2004-03-09  7:47     ` Ingo Molnar
2004-03-09 15:21       ` Andrea Arcangeli
2004-03-09 15:36         ` Ingo Molnar
2004-03-09 16:33           ` Andrea Arcangeli
2004-03-09 17:23             ` Martin J. Bligh
2004-03-09 19:57             ` Ingo Molnar
2004-03-09 20:27               ` Andrea Arcangeli
2004-03-10 11:35                 ` Ingo Molnar
2004-03-10 12:32                   ` Andrea Arcangeli [this message]
2004-03-09 10:52 ` [lockup] " Ingo Molnar
2004-03-09 11:02   ` Ingo Molnar
2004-03-09 11:09     ` Andrew Morton
2004-03-09 11:49       ` Ingo Molnar
2004-03-09 12:32         ` William Lee Irwin III
2004-03-09 16:03         ` Andrea Arcangeli
2004-03-10 10:36           ` RFC anon_vma previous (i.e. full objrmap) Andrea Arcangeli
2004-03-10 10:40             ` RFC anon_vma preview " Andrea Arcangeli
2004-03-10 10:54             ` RFC anon_vma previous " Ingo Molnar
2004-03-11  6:52             ` anon_vma RFC2 Andrea Arcangeli
2004-03-11 13:23               ` Hugh Dickins
2004-03-11 13:56                 ` Andrea Arcangeli
2004-03-11 21:54                   ` Hugh Dickins
2004-03-12  1:47                     ` Andrea Arcangeli
2004-03-12  2:20                       ` Andrea Arcangeli
2004-03-12  3:28                   ` Rik van Riel
2004-03-12 12:21                     ` Andrea Arcangeli
2004-03-12 12:40                       ` Rik van Riel
2004-03-12 13:11                         ` Andrea Arcangeli
2004-03-12 16:25                           ` Rik van Riel
2004-03-12 17:13                             ` Andrea Arcangeli
2004-03-12 17:23                               ` Rik van Riel
2004-03-12 17:44                                 ` Andrea Arcangeli
2004-03-12 18:18                                   ` Rik van Riel
2004-03-12 18:25                                 ` Linus Torvalds
2004-03-12 18:48                                   ` Rik van Riel
2004-03-12 19:02                                     ` Chris Friesen
2004-03-12 19:06                                       ` Rik van Riel
2004-03-12 19:10                                         ` Chris Friesen
2004-03-12 19:14                                           ` Rik van Riel
2004-03-12 20:27                                         ` Andrea Arcangeli
2004-03-12 20:32                                           ` Rik van Riel
2004-03-12 20:49                                             ` Andrea Arcangeli
2004-03-12 21:08                                   ` Jamie Lokier
2004-03-12 12:42                       ` Andrea Arcangeli
2004-03-12 12:46                       ` William Lee Irwin III
2004-03-12 13:24                         ` Andrea Arcangeli
2004-03-12 13:40                           ` William Lee Irwin III
2004-03-12 13:55                           ` Hugh Dickins
2004-03-12 16:01                             ` Andrea Arcangeli
2004-03-12 16:17                         ` Linus Torvalds
2004-03-13  0:28                           ` William Lee Irwin III
2004-03-13 14:43                           ` Rik van Riel
2004-03-13 16:18                             ` Linus Torvalds
2004-03-13 17:24                               ` Hugh Dickins
2004-03-13 17:28                                 ` Rik van Riel
2004-03-13 17:41                                   ` Hugh Dickins
2004-03-13 18:08                                     ` Andrea Arcangeli
2004-03-13 17:54                                   ` Andrea Arcangeli
2004-03-13 17:55                                     ` Andrea Arcangeli
2004-03-13 18:57                                   ` Linus Torvalds
2004-03-13 19:14                                     ` Hugh Dickins
2004-03-13 17:48                                 ` Andrea Arcangeli
2004-03-13 17:33                               ` Andrea Arcangeli
2004-03-13 17:53                                 ` Hugh Dickins
2004-03-13 18:13                                   ` Andrea Arcangeli
2004-03-13 19:35                                     ` Hugh Dickins
2004-03-13 17:57                                 ` Rik van Riel
2004-03-12 13:43                       ` Hugh Dickins
2004-03-12 15:56                         ` Andrea Arcangeli
2004-03-12 16:12                           ` Hugh Dickins
2004-03-12 16:39                             ` Andrea Arcangeli
2004-03-11 17:33                 ` Andrea Arcangeli
2004-03-11 22:20                 ` Rik van Riel
2004-03-11 23:43                   ` Hugh Dickins
2004-03-12  3:20                     ` Rik van Riel
2004-03-09 17:22         ` [lockup] Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Rik van Riel
2004-03-09 17:56           ` Andrea Arcangeli
2004-03-09 15:59     ` Andrea Arcangeli
2004-03-09 16:07       ` Ingo Molnar
2004-03-09 16:08         ` Ingo Molnar
2004-03-09 16:39           ` Andrea Arcangeli
2004-03-09 19:33             ` Ingo Molnar
2004-03-09 16:39         ` Andrea Arcangeli
2004-03-09 15:41   ` Andrea Arcangeli
2004-03-15 19:47     ` Marcelo Tosatti
2004-03-15 22:00       ` Andrea Arcangeli
2004-03-16  7:39         ` Marcelo Tosatti
2004-03-16 13:50           ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2004-03-09 17:40 Bond, Andrew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040310123250.GG30940@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=arjanv@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox