From: Joel Fernandes <joel@joelfernandes.org>
To: Christian Brauner <christian@brauner.io>
Cc: "Daniel Colascione" <dancol@google.com>,
"Suren Baghdasaryan" <surenb@google.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Sultan Alsawaf" <sultan@kerneltoast.com>,
"Tim Murray" <timmurray@google.com>,
"Michal Hocko" <mhocko@kernel.org>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Arve Hjønnevåg" <arve@android.com>,
"Todd Kjos" <tkjos@android.com>,
"Martijn Coenen" <maco@android.com>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>,
"open list:ANDROID DRIVERS" <devel@driverdev.osuosl.org>,
linux-mm <linux-mm@kvack.org>,
kernel-team <kernel-team@android.com>,
"Oleg Nesterov" <oleg@redhat.com>,
"Andy Lutomirski" <luto@amacapital.net>,
"Serge E. Hallyn" <serge@hallyn.com>,
"Kees Cook" <keescook@chromium.org>
Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android
Date: Tue, 19 Mar 2019 21:52:49 -0400 [thread overview]
Message-ID: <20190320015249.GC129907@google.com> (raw)
In-Reply-To: <20190319231020.tdcttojlbmx57gke@brauner.io>
On Wed, Mar 20, 2019 at 12:10:23AM +0100, Christian Brauner wrote:
> On Tue, Mar 19, 2019 at 03:48:32PM -0700, Daniel Colascione wrote:
> > On Tue, Mar 19, 2019 at 3:14 PM Christian Brauner <christian@brauner.io> wrote:
> > > So I dislike the idea of allocating new inodes from the procfs super
> > > block. I would like to avoid pinning the whole pidfd concept exclusively
> > > to proc. The idea is that the pidfd API will be useable through procfs
> > > via open("/proc/<pid>") because that is what users expect and really
> > > wanted to have for a long time. So it makes sense to have this working.
> > > But it should really be useable without it. That's why translate_pid()
> > > and pidfd_clone() are on the table. What I'm saying is, once the pidfd
> > > api is "complete" you should be able to set CONFIG_PROCFS=N - even
> > > though that's crazy - and still be able to use pidfds. This is also a
> > > point akpm asked about when I did the pidfd_send_signal work.
> >
> > I agree that you shouldn't need CONFIG_PROCFS=Y to use pidfds. One
> > crazy idea that I was discussing with Joel the other day is to just
> > make CONFIG_PROCFS=Y mandatory and provide a new get_procfs_root()
> > system call that returned, out of thin air and independent of the
> > mount table, a procfs root directory file descriptor for the caller's
> > PID namspace and suitable for use with openat(2).
>
> Even if this works I'm pretty sure that Al and a lot of others will not
> be happy about this. A syscall to get an fd to /proc? That's not going
> to happen and I don't see the need for a separate syscall just for that.
> (I do see the point of making CONFIG_PROCFS=y the default btw.)
I think his point here was that he wanted a handle to procfs no matter where
it was mounted and then can later use openat on that. Agreed that it may be
unnecessary unless there is a usecase for it, and especially if the /proc
directory being the defacto mountpoint for procfs is a universal convention.
> Inode allocation from the procfs mount for the file descriptors Joel
> wants is not correct. Their not really procfs file descriptors so this
> is a nack. We can't just hook into proc that way.
I was not particular about using procfs mount for the FDs but that's the only
way I knew how to do it until you pointed out anon_inode (my grep skills
missed that), so thank you!
> > C'mon: /proc is used by everyone today and almost every program breaks
> > if it's not around. The string "/proc" is already de facto kernel ABI.
> > Let's just drop the pretense of /proc being optional and bake it into
> > the kernel proper, then give programs a way to get to /proc that isn't
> > tied to any particular mount configuration. This way, we don't need a
> > translate_pid(), since callers can just use procfs to do the same
> > thing. (That is, if I understand correctly what translate_pid does.)
>
> I'm not sure what you think translate_pid() is doing since you're not
> saying what you think it does.
> Examples from the old patchset:
> translate_pid(pid, ns, -1) - get pid in our pid namespace
> translate_pid(pid, -1, ns) - get pid in other pid namespace
> translate_pid(1, ns, -1) - get pid of init task for namespace
> translate_pid(pid, -1, ns) > 0 - is pid is reachable from ns?
> translate_pid(1, ns1, ns2) > 0 - is ns1 inside ns2?
> translate_pid(1, ns1, ns2) == 0 - is ns1 outside ns2?
> translate_pid(1, ns1, ns2) == 1 - is ns1 equal ns2?
>
> Allowing this syscall to yield pidfds as proper regular file fds and
> also taking pidfds as argument is an excellent way to kill a few
> problems at once:
> - cheap pid namespace introspection
> - creates a bridge between the "old" pid-based api and the "new" pidfd api
This second point would solve the problem of getting a new pidfd given a pid
indeed, without depending on /proc/<pid> at all. So kudos for that and I am
glad you are making it return pidfds (but correct me if I misunderstood what
you're planning to do with translate_fd). It also obviates any need for
dealing with procfs mount points.
> - allows us to get proper non-directory file descriptors for any pids we
> like
Here I got a bit lost. AIUI pidfd is a directory fd. Why would we want it to
not be a directory fd? That would be ambigiuous with what pidfd_send_signal
expects.
Also would it be a bad idea to extend translate_pid to also do what we want
for the pidfd_wait syscall? So translate_fd in this case would return an fd
that is just used for the pid's death status.
All of these extensions seem to mean translate_pid should probably take a
fourth parameter that tells it the target translation type?
They way I am hypothesizing, translate_pid, it should probably be
- translation to a pid in some ns
- translation of a pid to a pidfd
- translation of a pid to a "wait" fd which returns the death/reap process status.
If that makes sense, that would also avoid the need for a new syscall we are adding.
> The additional advantage is that people are already happy to add this
> syscall so simply extending it and routing it through the pidfd tree or
> Eric's tree is reasonable. (It should probably grow a flag argument. I
> need to start prototyping this.)
Great!
> >
> > We still need a pidfd_clone() for atomicity reasons, but that's a
> > separate story. My goal is to be able to write a library that
>
> Yes, on my todo list and I have a ported patch based on prior working
> rotting somehwere on my git server.
Is that different from using dup2 on a pidfd? Sorry I don't follow what is
pidfd_clone / why it is needed.
thanks,
- Joel
next prev parent reply other threads:[~2019-03-20 1:52 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-10 20:34 [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android Sultan Alsawaf
2019-03-10 20:34 ` Sultan Alsawaf
2019-03-10 21:03 ` Greg Kroah-Hartman
2019-03-10 21:26 ` Sultan Alsawaf
2019-03-11 16:32 ` Joel Fernandes
2019-03-11 16:37 ` Joel Fernandes
2019-03-11 17:43 ` Michal Hocko
2019-03-11 17:58 ` Sultan Alsawaf
2019-03-11 20:10 ` Suren Baghdasaryan
2019-03-11 20:46 ` Sultan Alsawaf
2019-03-11 21:11 ` Joel Fernandes
2019-03-11 21:46 ` Sultan Alsawaf
2019-03-11 22:15 ` Suren Baghdasaryan
2019-03-11 22:36 ` Sultan Alsawaf
2019-03-12 8:05 ` Michal Hocko
2019-03-12 14:36 ` Suren Baghdasaryan
2019-03-12 15:25 ` Matthew Wilcox
2019-03-12 15:33 ` Michal Hocko
2019-03-12 15:39 ` Michal Hocko
2019-03-12 16:37 ` Sultan Alsawaf
2019-03-12 16:48 ` Michal Hocko
2019-03-12 16:58 ` Michal Hocko
2019-03-12 17:15 ` Suren Baghdasaryan
2019-03-12 17:17 ` Tim Murray
2019-03-12 17:45 ` Sultan Alsawaf
2019-03-12 18:43 ` Tim Murray
2019-03-12 18:50 ` Christian Brauner
2019-03-14 17:47 ` Joel Fernandes
2019-03-14 20:49 ` Sultan Alsawaf
2019-03-15 2:54 ` Joel Fernandes
2019-03-15 3:43 ` Sultan Alsawaf
2019-03-15 3:16 ` Steven Rostedt
2019-03-15 3:45 ` Sultan Alsawaf
2019-03-15 4:36 ` Daniel Colascione
2019-03-15 13:36 ` Joel Fernandes
2019-03-15 15:56 ` Suren Baghdasaryan
2019-03-15 16:12 ` Daniel Colascione
2019-03-15 16:43 ` Steven Rostedt
2019-03-15 17:17 ` Daniel Colascione
2019-03-15 18:03 ` Christian Brauner
2019-03-15 18:13 ` Joel Fernandes
2019-03-15 18:24 ` Christian Brauner
2019-03-15 18:49 ` Joel Fernandes
2019-03-16 17:31 ` Suren Baghdasaryan
2019-03-16 18:00 ` Daniel Colascione
2019-03-16 18:57 ` Christian Brauner
2019-03-16 19:37 ` Suren Baghdasaryan
2019-03-17 1:53 ` Joel Fernandes
2019-03-17 11:42 ` Christian Brauner
2019-03-17 15:40 ` Daniel Colascione
2019-03-18 0:29 ` Christian Brauner
2019-03-18 23:50 ` Joel Fernandes
2019-03-19 22:14 ` Christian Brauner
2019-03-19 22:26 ` Joel Fernandes
2019-03-19 22:48 ` Daniel Colascione
2019-03-19 23:10 ` Christian Brauner
2019-03-20 1:52 ` Joel Fernandes [this message]
2019-03-20 2:42 ` pidfd design Daniel Colascione
2019-03-20 3:59 ` Christian Brauner
2019-03-20 7:02 ` Daniel Colascione
2019-03-20 11:33 ` Joel Fernandes
2019-03-20 11:33 ` Joel Fernandes
2019-03-20 18:26 ` Christian Brauner
2019-03-20 18:38 ` Daniel Colascione
2019-03-20 18:51 ` Christian Brauner
2019-03-20 18:58 ` Andy Lutomirski
2019-03-20 19:14 ` Christian Brauner
2019-03-20 19:40 ` Daniel Colascione
2019-03-21 17:02 ` Andy Lutomirski
2019-03-25 20:13 ` Jann Horn
2019-03-25 20:13 ` Jann Horn
2019-03-25 20:23 ` Daniel Colascione
2019-03-25 20:23 ` Daniel Colascione
2019-03-25 23:42 ` Andy Lutomirski
2019-03-25 23:42 ` Andy Lutomirski
2019-03-25 23:45 ` Christian Brauner
2019-03-25 23:45 ` Christian Brauner
2019-03-26 0:00 ` Andy Lutomirski
2019-03-26 0:00 ` Andy Lutomirski
2019-03-26 0:12 ` Christian Brauner
2019-03-26 0:12 ` Christian Brauner
2019-03-26 0:24 ` Andy Lutomirski
2019-03-26 0:24 ` Andy Lutomirski
2019-03-28 9:21 ` Christian Brauner
2019-03-28 9:21 ` Christian Brauner
2019-03-20 19:19 ` Joel Fernandes
2019-03-20 19:29 ` Daniel Colascione
2019-03-24 14:44 ` Serge E. Hallyn
2019-03-24 18:48 ` Joel Fernandes
2019-03-20 19:11 ` Joel Fernandes
2019-05-07 2:16 ` [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android Sultan Alsawaf
2019-05-07 2:16 ` Sultan Alsawaf
2019-05-07 7:04 ` Greg Kroah-Hartman
2019-05-07 7:27 ` Sultan Alsawaf
2019-05-07 7:43 ` Greg Kroah-Hartman
2019-05-07 8:12 ` Sultan Alsawaf
2019-05-07 10:58 ` Christian Brauner
2019-05-07 16:28 ` Suren Baghdasaryan
2019-05-07 16:38 ` Christian Brauner
2019-05-07 16:53 ` Sultan Alsawaf
2019-05-07 20:01 ` Suren Baghdasaryan
2019-05-07 18:46 ` Joel Fernandes
2019-05-07 17:17 ` Sultan Alsawaf
2019-05-07 17:29 ` Greg Kroah-Hartman
2019-05-07 11:09 ` Greg Kroah-Hartman
2019-05-07 12:26 ` Michal Hocko
2019-05-07 15:31 ` Oleg Nesterov
2019-05-07 16:35 ` Sultan Alsawaf
2019-05-09 15:56 ` Oleg Nesterov
2019-05-09 18:33 ` Sultan Alsawaf
2019-05-10 15:10 ` Oleg Nesterov
2019-05-13 16:45 ` Sultan Alsawaf
2019-05-14 16:44 ` Steven Rostedt
2019-05-14 17:31 ` Sultan Alsawaf
2019-05-15 14:58 ` Oleg Nesterov
2019-05-15 17:27 ` Sultan Alsawaf
2019-05-15 17:27 ` Sultan Alsawaf
2019-05-15 18:32 ` Steven Rostedt
2019-05-15 18:52 ` Sultan Alsawaf
2019-05-15 20:09 ` Steven Rostedt
2019-05-16 13:54 ` Oleg Nesterov
2019-03-17 16:35 ` Serge E. Hallyn
2019-03-17 17:11 ` Daniel Colascione
2019-03-17 17:16 ` Serge E. Hallyn
2019-03-17 22:02 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190320015249.GC129907@google.com \
--to=joel@joelfernandes.org \
--cc=arve@android.com \
--cc=christian@brauner.io \
--cc=dancol@google.com \
--cc=devel@driverdev.osuosl.org \
--cc=gregkh@linuxfoundation.org \
--cc=keescook@chromium.org \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=maco@android.com \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=serge@hallyn.com \
--cc=sultan@kerneltoast.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
--cc=tkjos@android.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.