public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <longman@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Will Deacon <will@kernel.org>,
	Bernd Edlinger <bernd.edlinger@hotmail.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Alexey Gladkov <gladkov.alexey@gmail.com>
Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
Date: Mon, 06 Apr 2020 17:17:27 -0500	[thread overview]
Message-ID: <87blo45keg.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CAHk-=wjxyGCj9675mf31uhoJCyHn74ON_+O6SjSqBSSvqWxC1Q@mail.gmail.com> (Linus Torvalds's message of "Fri, 3 Apr 2020 12:26:51 -0700")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> [ For Waiman & co - the problem is that the current cred_guard_mutex
> is horrendous and has problems with execve() deadlocking against
> various users. We've had this bug before, there's a new one, it's just
> nasty ]
>
> On Thu, Apr 2, 2020 at 4:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> That is not the direction I intend to take either.
>>
>> I was hoping I could put off replying to this thread for a bit because
>> I only managed to get 4 hours of sleep last night and I am not as alert
>> to technical details as I would like to be.
>
> Hmm.. So I've been looking at this cred_guard_mutex, and I wonder...
>
> This is a bit hand-wavy, because I haven't walker through all the
> paths, but could we perhaps work around a lot of the problems a
> different way., namely:
>
>  - make the "cred_guard_mutex" an rwsem-like thing instead of being a mutex.
>
>  - make the ptrace_attach() case get it for writing - not because
> ptrace changes the creds, but because ptrace changes 'task->ptrace'
> and depends on dumpability etc.
>
>  - change the *name* of that damn thing. Not because it's now
> rwsem'ish rather than a mutex, but because it was never really about
> just "creds". It was about creds+ptrace+dumpable flags etc.
>
>  - make all the ones that read the creds to just take it for reading
> (IOW, the cases that were basically switched over to
> exec_update_mutex).
>
>  - and finally: make "execve()" take it just for reading too, but
> introduce a "upgrade to write" at the very end (when it actually is
> all done and then finally changes the creds and dumpability)
>
> Wouldn't that solve all problems? We wouldn't get deadlocks wrt
> execve(), simply because execve() doesn't need it to be writable, and
> the things execve() does and can deadlock all only want readability.
>
> But hear me out, because the above is fundamentally broken in a couple
> of ways, so let me address that brokenness before you tell me I'm a
> complete nincompoop and an idiot.
>
> I'm including some locking people here because of these issues, so
> that they can maybe verify my thinking.
>
>  (a) our rwsem's are fair
>
>      So the whole "execve takes it for reading, so now others can take
> it for reading too without deadlocks" is simply not true - if you use
> the existing rwsem.
>
>      Because a concurrent (blocked) writer will then block other
> readers for fairness reasons, and holding it for reading doesn't
> guarantee that others can get it for reading.
>
>      So clearly, the above doesn't even *fix* the deadlocks - unless
> we have an unfair mode (or just a special lock for just this that is
> not our standard rwsem, but a special unfair one).
>
>      So I'm suggesting we use a special unfair rwsem here (we can make
> a simple spinlock-based one - it doesn't need to be as clever or
> optimized as the real rwsems are)
>
>  (b) similarly, our rwsem's don't actually have a "upgrade from read
> to write", because that's also a fundamentally deadlocky operation.
>
>      Again, that's true. Except execve() is special, and we know
> there's only _one_ execve() at a time that will complete, since we're
> serializing them. So for this particular use, "upgrade to write" would
> be possible without the general-case deadlock issues.
>
>  (c) I didn't think things through, and even with these special
> semantics, my idea is complete garbage
>
>      Ok, this may well be true.
>
> Anyway, the advantage of this (if it works) is that it would allow us
> to go back to the _really_ simple original model of just taking this
> lock for reading at the beginning of execve(), and not worrying so
> much about complex nesting or very complex rules for exactly when we
> got the lock and error handling.
>
> The final part when we actually update the credentials and dumpability
> and stuff in execve() is actually fairly simple. So the "upgrade to a
> write lock" phase doesn't worry me too much.  It's the interaction
> with all the previous parts (which happen with it held just for
> reading) that tend to be the nastier ones.
>
> And ptrace_attach() really is special, and I think it would be the
> only one that really needs that write lock.
>
> The disadvantage, of course, is that it would require that
> special-case lock semantic, and I might also be missing some thing
> that makes it not work anyway.
>
> Comments? Am I just dreaming of a simpler model without my medications
> again?

Withough reading everything through at least.

* There is also security_setprocattr which needs ptrace and nnp state not to
  change it needs to set something that at least selinux's cred
  calculations needs to remain constant (like nnp and ptrace).

  Which means one thread calling security_setprocattr and another thread
  calling exec can deadlock in de_thread.

* Even with your lock and just the ptrace case I can deadlock.
  Ptracer:                  Thread A               Tread B
     ptrace_attach A
                                                   exec
     ptrace_attach B
                                                   uprade R to RW
     ---------------------- DEADLOCKED -------------------------

Those are the first two cases I have thought of.  There are probably
more.



But fundamentally the only reason we need this information stable
before the point of no return is so that we can return a nice error
code to the process calling exec.  Instead of terminating the
process with SIGSEGV.

These are for the most part unlikely scenarios or people would have been
complaining much more loudly about deadlock.

So my plan is to perform the relevant calculations effectively twice.
Once just before the point of no return, and give a graceful return code
if necessary and possible.  Once just afer the point of no return, and
SIGSEGV if necessary.



Of course this all only applies to LSMs that refuse to continue under
NNP or ptrace without changing the cred.  Linux without those LSMs
enabled will just continue with the original credentials.


So I don't think we will noticably be sacraficing the quality of
the user experience with my plan.  In the worst case a deadlock
will become a SIGSEGV killing the execing program.

Eric

p.s. Yes we can do better than a mutex that makes everything mutually
     exclusive. I am just starting there for simplicity, and to
     see if we need anything better.  Unfortuantely too many things are
     changing simultaneously for rcu to cover all of the read side
     cases.


  parent reply	other threads:[~2020-04-06 22:20 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87blobnq02.fsf@x220.int.ebiederm.org>
2020-04-02 19:04 ` [GIT PULL] Please pull proc and exec work for 5.7-rc1 Linus Torvalds
2020-04-02 19:31   ` Bernd Edlinger
2020-04-02 19:52     ` Linus Torvalds
2020-04-02 20:59       ` Bernd Edlinger
2020-04-02 21:46         ` Linus Torvalds
2020-04-02 23:01           ` Eric W. Biederman
2020-04-02 23:42             ` Bernd Edlinger
2020-04-02 23:45               ` Eric W. Biederman
2020-04-02 23:49                 ` Bernd Edlinger
2020-04-02 23:45               ` Linus Torvalds
2020-04-02 23:44             ` Linus Torvalds
2020-04-03  0:05               ` Eric W. Biederman
2020-04-07  1:29               ` [RFC][PATCH 0/3] exec_update_mutex related cleanups Eric W. Biederman
2020-04-07  1:31                 ` [PATCH 1/3] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
2020-04-07 15:58                   ` Kees Cook
2020-04-07 16:11                   ` Christian Brauner
2020-04-08 17:25                   ` Linus Torvalds
2020-04-08 19:51                     ` Eric W. Biederman
2020-04-07  1:31                 ` [PATCH 2/3] exec: Make unlocking exec_update_mutex explict Eric W. Biederman
2020-04-07 16:02                   ` Kees Cook
2020-04-07 16:17                   ` Christian Brauner
2020-04-07 16:21                     ` Eric W. Biederman
2020-04-07  1:32                 ` [PATCH 3/3] exec: Rename the flag called_exec_mmap point_of_no_return Eric W. Biederman
2020-04-07 16:03                   ` Kees Cook
2020-04-07 16:21                   ` Christian Brauner
2020-04-07 16:22                 ` [RFC][PATCH 0/3] exec_update_mutex related cleanups Christian Brauner
2020-04-08 17:26                 ` Linus Torvalds
2020-04-03  5:09             ` [GIT PULL] Please pull proc and exec work for 5.7-rc1 Bernd Edlinger
2020-04-03 19:26             ` Linus Torvalds
2020-04-03 20:41               ` Waiman Long
2020-04-03 20:59                 ` Linus Torvalds
2020-04-03 23:16                   ` Waiman Long
2020-04-03 23:23                     ` Waiman Long
2020-04-04  1:30                       ` Linus Torvalds
2020-04-04  2:02                         ` Waiman Long
2020-04-04  2:28                           ` Linus Torvalds
2020-04-04  6:34                             ` Bernd Edlinger
2020-04-05  6:34                               ` Bernd Edlinger
2020-04-05 19:35                                 ` Linus Torvalds
2020-04-05  2:42                             ` Waiman Long
2020-04-05  3:35                               ` Bernd Edlinger
2020-04-05  3:45                                 ` Waiman Long
2020-04-06 13:13                             ` Will Deacon
2020-04-04  4:23                     ` Bernd Edlinger
2020-04-06 22:17               ` Eric W. Biederman [this message]
2020-04-07 19:50                 ` Linus Torvalds
2020-04-07 20:29                   ` Bernd Edlinger
2020-04-07 20:47                     ` Linus Torvalds
2020-04-08 15:14                   ` Eric W. Biederman
2020-04-08 15:21                     ` Bernd Edlinger
2020-04-08 16:34                     ` Linus Torvalds
2020-04-09 14:58                       ` Eric W. Biederman
2020-04-09 15:15                         ` Bernd Edlinger
2020-04-09 16:15                         ` Linus Torvalds
2020-04-09 16:24                           ` Linus Torvalds
2020-04-09 17:03                             ` Eric W. Biederman
2020-04-09 17:17                               ` Bernd Edlinger
2020-04-09 17:37                                 ` Linus Torvalds
2020-04-09 17:46                                   ` Bernd Edlinger
2020-04-09 18:36                                     ` Linus Torvalds
2020-04-09 19:42                                       ` Linus Torvalds
2020-04-09 19:57                                         ` Bernd Edlinger
2020-04-09 20:04                                           ` Linus Torvalds
2020-04-09 20:36                                             ` Bernd Edlinger
2020-04-09 21:00                                             ` Eric W. Biederman
2020-04-09 21:17                                               ` Linus Torvalds
2020-04-09 23:52                                                 ` Bernd Edlinger
2020-04-10  0:30                                                 ` Linus Torvalds
2020-04-10  0:32                                                   ` Linus Torvalds
2020-04-11  4:07                                                     ` Bernd Edlinger
2020-04-11 18:20                                                   ` Oleg Nesterov
2020-04-11 18:29                                                     ` Linus Torvalds
2020-04-11 18:31                                                       ` Linus Torvalds
2020-04-11 19:15                                                       ` Bernd Edlinger
2020-04-11 20:07                                                         ` Linus Torvalds
2020-04-11 21:16                                                           ` Bernd Edlinger
     [not found]                                                             ` <CAHk-=wgWHkBzFazWJj57emHPd3Dg9SZHaZqoO7-AD+UbBTJgig@mail.gmail.com>
2020-04-11 21:57                                                               ` Linus Torvalds
2020-04-12  6:01                                                                 ` Bernd Edlinger
2020-04-12 19:50                                                       ` Oleg Nesterov
2020-04-12 20:14                                                         ` Linus Torvalds
2020-04-28  2:56                                                           ` Bernd Edlinger
2020-04-28 17:07                                                             ` Linus Torvalds
2020-04-28 19:08                                                               ` Oleg Nesterov
2020-04-28 20:35                                                                 ` Linus Torvalds
2020-04-28 21:06                                                                   ` Jann Horn
2020-04-28 21:36                                                                     ` Linus Torvalds
2020-04-28 21:53                                                                       ` Jann Horn
2020-04-28 22:14                                                                         ` Linus Torvalds
2020-04-28 23:36                                                                           ` Jann Horn
2020-04-29 17:58                                                                             ` Linus Torvalds
2020-04-29 18:33                                                                               ` Jann Horn
2020-04-29 18:57                                                                                 ` Linus Torvalds
2020-04-29 19:23                                                                               ` Bernd Edlinger
2020-04-29 19:26                                                                                 ` Jann Horn
2020-04-29 20:19                                                                                   ` Bernd Edlinger
2020-04-29 21:06                                                                                     ` Jann Horn
2020-04-29 22:38                                                                                 ` Linus Torvalds
2020-04-29 23:22                                                                                   ` Linus Torvalds
2020-04-29 23:59                                                                                     ` Jann Horn
2020-04-30  1:08                                                                                       ` Bernd Edlinger
2020-04-30  2:20                                                                                         ` Linus Torvalds
2020-04-30  3:00                                                                                           ` Jann Horn
2020-04-30  3:25                                                                                             ` Linus Torvalds
2020-04-30  3:41                                                                                               ` Jann Horn
2020-04-30  3:50                                                                                                 ` Linus Torvalds
2020-04-30 13:37                                                                                               ` Linus Torvalds
2020-04-30  2:16                                                                                       ` Linus Torvalds
2020-04-30 13:39                                                                                         ` Bernd Edlinger
2020-04-30 13:47                                                                                           ` Linus Torvalds
2020-04-30 14:29                                                                                             ` Bernd Edlinger
2020-04-30 16:40                                                                                               ` Linus Torvalds
2020-05-02  4:11                                                                                                 ` Bernd Edlinger
2025-08-24 22:28                                                                     ` Bernd Edlinger
2020-04-09 17:36                               ` Linus Torvalds
2020-04-09 20:34                                 ` Eric W. Biederman
2020-04-09 20:56                                   ` Linus Torvalds
2020-04-02 23:02           ` Bernd Edlinger
2020-04-02 23:22           ` Bernd Edlinger
2020-04-03  7:38           ` Bernd Edlinger
2020-04-03 16:00       ` Bernd Edlinger
2020-04-03 15:09   ` Bernd Edlinger
2020-04-03 16:23     ` Linus Torvalds
2020-04-03 16:36       ` Bernd Edlinger
2020-04-04  5:43       ` Bernd Edlinger
2020-04-04  5:48         ` Bernd Edlinger
2020-04-06  6:41           ` Bernd Edlinger
2020-04-10 13:03 ` [GIT PULL] proc fix " Eric W. Biederman
2020-04-10 20:40   ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blo45keg.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=bernd.edlinger@hotmail.de \
    --cc=gladkov.alexey@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox