public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Borislav Petkov <bp@alien8.de>, Brian Gerst <brgerst@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti_disable is set
Date: Thu, 11 Jan 2018 19:05:02 +0100	[thread overview]
Message-ID: <20180111180502.GE15344@1wt.eu> (raw)
In-Reply-To: <CALCETrWb7esaT03RPyRqOzUPkYZ3Hxz1M_WwwkGtStwfYn3ehg@mail.gmail.com>

On Thu, Jan 11, 2018 at 09:53:26AM -0800, Andy Lutomirski wrote:
> >> So I think that no-pti mode is a privilege as opposed to a mode per
> >> se.  If you can turn off PTI, then you have the ability to read all of
> >> kernel memory  So maybe we should treat it as such.  Add a capability
> >> CAP_DISABLE_PTI.  If you have that capability (globally), then you can
> >> use the arch_prctl() or regular prctl() or whatever to turn PTI on.
> >> If you lose the cap, you lose no-pti mode as well.
> >
> > I disagree on this, because the only alternative I have is to decide
> > to keep my process running as root, which is even worse, as root can
> > much more easily escape from a chroot jail for example, or access
> > /dev/mem and read all the memory as well. Also tell Linus that he'll
> > have to build his kernels as root ;-)
> 
> Not since Linux 4.3 :)  You can set CAP_DISABLE_PTI as an "ambient"
> capability and let it be inherited by all descendents even if
> unprivileged.  This was all very carefully designed so that a program
> that inherited an ambient capability but tries to run a
> less-privileged child using pre-4.3 techniques will correctly drop the
> ambient capability, which is *exactly* what we want for PTI.

Ah thanks for explaining what these "ambient" capabilities are, I saw
the term a few times but never looked closer.

> So I stand by my suggestion.  Linus could still do:
> 
> $ nopti make -j512
> 
> and have it work, but trying to ptrace() the make process from outside
> the nopti process tree (and without doing nopti ptrace) will fail, as
> expected.  (Unless root does the ptrace, again as expected.)

This may be reasonable.

> > The arch_prctl() calls I proposed only allow to turn PTI off for
> > privileged users but any user can turn it back on. For me it's
> > important. A process might trust itself for its own use, but such
> > processes will rarely trust external processes in case they need to
> > perform an occasional fork+exec. Imagine for example a static web
> > server requiring to achieve the highest possible performance and
> > having to occasionally call logrotate to rotate+compress the logs.
> > It's better if the process knows how to turn PTI back on before
> > calling this.
> 
> In my proposal, CAP_DISABLE_PTI doesn't turn off PTI -- it just grants
> the ability to have PTI off.  If you have PTI off, you can turn it
> back in using prctl() or whatever.  So you call prctl() (to turn PTI
> back on) or capset() (to turn it on and drop the ability to turn it
> off).

Hmmm OK. I still don't like much to conflate the "turn back on" between
the two distinct calls. If the capability grants you the right to act
on prctl(), it should not perform the action in your back when disabled.
It may even lead people to care less about it which is not a good practise.

> How exactly do you plan to efficiently call logrotate from your
> webserver if the flag is per-mm, though?  You don't want to fork() in
> your fancy server because fork() write-protects the whole address
> space.

I'm not following you, what's the problem here ? I mean most programs
do fork() then close() a few FDs that were not marked CLOEXEC, then
execve(). The purpose is to avoid changing existing programs too much.

> So you use clone() or vfork() (like posix_spawn() does
> internally on a sane libc), but now you can't turn PTI back on if it's
> per-mm because you haven't changed your mm.

But as soon as you write anything it's cloned, right ? Ie you write in
the stack. I could say bullshit here, but surely we have a way to split
them.

> I really really think it should be per thread.

It could be a good argument here.

> >> As for per-mm vs per-thread, let's make it only switchable in
> >> single-threaded processes for now and inherited when threads are
> >> created.
> >
> > That's exactly what it does for now, but Linus doesn't like it at all.
> > So I'll switch it back to per-mm + per-CPU variable. Well he has a valid
> > point regarding the pgd and _PAGE_NX setting. point Now at least we know
> > the change is minimal if we have a good reason for doing differently
> > later.
> 
> Yuck, I hate this.  Are you really going to wire it up complete with
> all the IPIs needed to get the thing synced up right?  it's also going
> to run slower no matter what, I think, because you'll have to sync the
> per-mm state back to the TI flags on context switches.

At this point I'm lost, all I can do is trust you guys once you agree
on a solution :-)

> Linus, can you explain again why you think PTI should be a per-mm
> thing?  That is, why do you think it's useful and why do you think it
> makes logical sense from a user's POV?  I think the implementation is
> easier and saner for per-thread.  Also, if I we use a capability bit
> for it, making it per-mm gets really weird.

thanks,
Willy

  reply	other threads:[~2018-01-11 18:05 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-09 12:56 [RFC PATCH v2 0/6] Per process PTI activation Willy Tarreau
2018-01-09 12:56 ` [RFC PATCH v2 1/6] x86/mm: add a pti_disable entry in mm_context_t Willy Tarreau
2018-01-09 12:56 ` [RFC PATCH v2 2/6] x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to enable/disable PTI Willy Tarreau
2018-01-09 14:17   ` Borislav Petkov
2018-01-09 14:36     ` Willy Tarreau
2018-01-09 14:51       ` Borislav Petkov
2018-01-09 14:54         ` Willy Tarreau
2018-01-09 21:26           ` Andy Lutomirski
2018-01-09 21:29             ` Borislav Petkov
2018-01-09 21:32               ` Willy Tarreau
2018-01-09 21:46                 ` Borislav Petkov
2018-01-09 22:06                   ` Willy Tarreau
2018-01-09 22:20                     ` Borislav Petkov
2018-01-09 22:29                       ` Dave Hansen
2018-01-09 22:40                       ` Willy Tarreau
2018-01-10 14:42                         ` Borislav Petkov
2018-01-10 15:39                           ` Willy Tarreau
2018-01-10 16:09                             ` Borislav Petkov
2018-01-10 16:19                               ` Willy Tarreau
2018-01-10 17:28                                 ` Borislav Petkov
2018-01-10  7:31                       ` Ingo Molnar
2018-01-10  7:37                         ` Willy Tarreau
2018-01-10  7:59                           ` Ingo Molnar
2018-01-09 23:53                     ` Andy Lutomirski
2018-01-10  4:25                       ` Willy Tarreau
2018-01-10  7:25               ` Ingo Molnar
2018-01-10 14:45                 ` Borislav Petkov
2018-01-10 15:43                   ` Willy Tarreau
2018-01-10 15:45                   ` Ingo Molnar
2018-01-09 21:34             ` Kees Cook
2018-01-09 21:41             ` Willy Tarreau
2018-01-09 21:50               ` Kees Cook
2018-01-09 22:03                 ` Willy Tarreau
2018-01-10  7:13             ` Ingo Molnar
2018-01-12 15:03   ` David Laight
2018-01-12 15:06     ` Willy Tarreau
2018-01-09 12:56 ` [RFC PATCH v2 3/6] x86/pti: add a per-cpu variable pti_disable Willy Tarreau
2018-01-10  7:19   ` Ingo Molnar
2018-01-10  7:29     ` Willy Tarreau
2018-01-10  8:01       ` Ingo Molnar
2018-01-10  8:50         ` Willy Tarreau
2018-01-10  8:59           ` Ingo Molnar
2018-01-10  9:00             ` Willy Tarreau
2018-01-09 12:56 ` [RFC PATCH v2 4/6] x86/pti: don't mark the user PGD with _PAGE_NX Willy Tarreau
2018-01-09 12:56 ` [RFC PATCH v2 5/6] x86/entry/pti: avoid setting CR3 when it's already correct Willy Tarreau
2018-01-10  7:16   ` Ingo Molnar
2018-01-10  7:18     ` Willy Tarreau
2018-01-10 20:29   ` Dave Hansen
2018-01-11  6:46     ` Willy Tarreau
2018-01-09 12:56 ` [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti_disable is set Willy Tarreau
2018-01-10  7:15   ` Ingo Molnar
2018-01-10  7:23     ` Willy Tarreau
2018-01-10  8:22   ` Peter Zijlstra
2018-01-10  9:11     ` Willy Tarreau
2018-01-10 19:21       ` Andy Lutomirski
2018-01-10 19:39         ` Willy Tarreau
2018-01-10 19:44           ` Andy Lutomirski
2018-01-10 19:50         ` Linus Torvalds
2018-01-10 20:04           ` Andy Lutomirski
2018-01-11  6:42           ` Willy Tarreau
2018-01-11 15:29             ` Dave Hansen
2018-01-11 15:44               ` Willy Tarreau
2018-01-11 15:51                 ` Dave Hansen
2018-01-11 17:02                   ` Andy Lutomirski
2018-01-11 18:21                     ` Alexei Starovoitov
2018-01-11 18:30                       ` Dave Hansen
2018-01-11 18:32                       ` Josh Poimboeuf
2018-01-11 18:36                         ` Linus Torvalds
2018-01-11 18:38                         ` Dave Hansen
2018-01-11 18:51                           ` Linus Torvalds
2018-01-11 18:57                             ` Dave Hansen
2018-01-11 19:05                               ` Josh Poimboeuf
2018-01-11 19:07                               ` Borislav Petkov
2018-01-11 19:17                                 ` Dave Hansen
2018-01-11 19:19                                   ` Olivier Galibert
2018-01-11 19:26                                     ` Josh Poimboeuf
2018-01-11 19:34                                       ` Alan Cox
2018-01-11 21:23                                         ` Willy Tarreau
2018-01-11 21:28                                           ` Linus Torvalds
2018-01-11 22:06                                             ` Willy Tarreau
2018-01-12 16:37                                               ` David Laight
2018-01-11 19:12                               ` Linus Torvalds
2018-01-11 19:38                               ` Alexei Starovoitov
2018-01-11 19:11                           ` Willy Tarreau
2018-01-11 20:00                     ` Dave Hansen
2018-01-11 17:09                 ` Andy Lutomirski
2018-01-11 17:40                   ` Willy Tarreau
2018-01-11 17:53                     ` Andy Lutomirski
2018-01-11 18:05                       ` Willy Tarreau [this message]
2018-01-11 18:15                         ` Dave Hansen
2018-01-11 18:31                           ` Linus Torvalds
2018-01-11 18:25                     ` Linus Torvalds
2018-01-11 18:26                       ` Linus Torvalds
2018-01-11 19:33                         ` Andy Lutomirski
2018-01-12 20:22                           ` Ingo Molnar
2018-01-12 21:18                             ` Andy Lutomirski
2018-01-12 21:54                               ` Willy Tarreau
2018-01-11 21:59                       ` Willy Tarreau
2018-01-12 16:27                       ` David Laight
2018-01-12 17:55                         ` Linus Torvalds
2018-01-12 19:36                           ` Willy Tarreau
2018-01-11 18:35                 ` Dave Hansen
2018-01-11 21:49                   ` Willy Tarreau
  -- strict thread matches above, loose matches on Subject: below --
2018-01-11 23:11 Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180111180502.GE15344@1wt.eu \
    --to=w@1wt.eu \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox