From: Joel Fernandes <joel@joelfernandes.org>
To: Christian Brauner <christian@brauner.io>
Cc: "Daniel Colascione" <dancol@google.com>,
"Suren Baghdasaryan" <surenb@google.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Sultan Alsawaf" <sultan@kerneltoast.com>,
"Tim Murray" <timmurray@google.com>,
"Michal Hocko" <mhocko@kernel.org>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Arve Hjønnevåg" <arve@android.com>,
"Todd Kjos" <tkjos@android.com>,
"Martijn Coenen" <maco@android.com>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>,
"open list:ANDROID DRIVERS" <devel@driverdev.osuosl.org>,
linux-mm <linux-mm@kvack.org>,
kernel-team <kernel-team@android.com>,
"Oleg Nesterov" <oleg@redhat.com>,
"Andy Lutomirski" <luto@amacapital.net>,
"Serge E. Hallyn" <serge@hallyn.com>,
keescook@chromium.org
Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android
Date: Tue, 19 Mar 2019 18:26:52 -0400 [thread overview]
Message-ID: <20190319222652.GA105485@google.com> (raw)
In-Reply-To: <20190319221415.baov7x6zoz7hvsno@brauner.io>
On Tue, Mar 19, 2019 at 11:14:17PM +0100, Christian Brauner wrote:
[snip]
> >
> > ---8<-----------------------
> >
> > From: Joel Fernandes <joelaf@google.com>
> > Subject: [PATCH] Partial skeleton prototype of pidfd_wait frontend
> >
> > Signed-off-by: Joel Fernandes <joelaf@google.com>
> > ---
> > arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> > arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> > include/linux/syscalls.h | 1 +
> > include/uapi/asm-generic/unistd.h | 4 +-
> > kernel/signal.c | 62 ++++++++++++++++++++++++++
> > kernel/sys_ni.c | 3 ++
> > 6 files changed, 71 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> > index 1f9607ed087c..2a63f1896b63 100644
> > --- a/arch/x86/entry/syscalls/syscall_32.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> > @@ -433,3 +433,4 @@
> > 425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup
> > 426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter
> > 427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register
> > +428 i386 pidfd_wait sys_pidfd_wait __ia32_sys_pidfd_wait
> > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> > index 92ee0b4378d4..cf2e08a8053b 100644
> > --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -349,6 +349,7 @@
> > 425 common io_uring_setup __x64_sys_io_uring_setup
> > 426 common io_uring_enter __x64_sys_io_uring_enter
> > 427 common io_uring_register __x64_sys_io_uring_register
> > +428 common pidfd_wait __x64_sys_pidfd_wait
> >
> > #
> > # x32-specific system call numbers start at 512 to avoid cache impact
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index e446806a561f..62160970ed3f 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -988,6 +988,7 @@ asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
> > asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
> > siginfo_t __user *info,
> > unsigned int flags);
> > +asmlinkage long sys_pidfd_wait(int pidfd);
> >
> > /*
> > * Architecture-specific system calls
> > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> > index dee7292e1df6..137aa8662230 100644
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
> > __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
> > #define __NR_io_uring_register 427
> > __SYSCALL(__NR_io_uring_register, sys_io_uring_register)
> > +#define __NR_pidfd_wait 428
> > +__SYSCALL(__NR_pidfd_wait, sys_pidfd_wait)
> >
> > #undef __NR_syscalls
> > -#define __NR_syscalls 428
> > +#define __NR_syscalls 429
> >
> > /*
> > * 32 bit systems traditionally used different
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index b7953934aa99..ebb550b87044 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -3550,6 +3550,68 @@ static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo, siginfo_t *info)
> > return copy_siginfo_from_user(kinfo, info);
> > }
> >
> > +static ssize_t pidfd_wait_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > +{
> > + /*
> > + * This is just a test string, it will contain the actual
> > + * status of the pidfd in the future.
> > + */
> > + char buf[] = "status";
> > +
> > + return copy_to_iter(buf, strlen(buf)+1, to);
> > +}
> > +
> > +static const struct file_operations pidfd_wait_file_ops = {
> > + .read_iter = pidfd_wait_read_iter,
> > +};
> > +
> > +static struct inode *pidfd_wait_get_inode(struct super_block *sb)
> > +{
> > + struct inode *inode = new_inode(sb);
> > +
> > + inode->i_ino = get_next_ino();
> > + inode_init_owner(inode, NULL, S_IFREG);
> > +
> > + inode->i_op = &simple_dir_inode_operations;
> > + inode->i_fop = &pidfd_wait_file_ops;
> > +
> > + return inode;
> > +}
> > +
> > +SYSCALL_DEFINE1(pidfd_wait, int, pidfd)
> > +{
> > + struct fd f;
> > + struct inode *inode;
> > + struct file *file;
> > + int new_fd;
> > + struct pid_namespace *pid_ns;
> > + struct super_block *sb;
> > + struct vfsmount *mnt;
> > +
> > + f = fdget_raw(pidfd);
> > + if (!f.file)
> > + return -EBADF;
> > +
> > + sb = file_inode(f.file)->i_sb;
> > + pid_ns = sb->s_fs_info;
> > +
> > + inode = pidfd_wait_get_inode(sb);
> > +
> > + mnt = pid_ns->proc_mnt;
> > +
> > + file = alloc_file_pseudo(inode, mnt, "pidfd_wait", O_RDONLY,
> > + &pidfd_wait_file_ops);
>
> So I dislike the idea of allocating new inodes from the procfs super
> block. I would like to avoid pinning the whole pidfd concept exclusively
> to proc. The idea is that the pidfd API will be useable through procfs
> via open("/proc/<pid>") because that is what users expect and really
> wanted to have for a long time. So it makes sense to have this working.
> But it should really be useable without it. That's why translate_pid()
> and pidfd_clone() are on the table. What I'm saying is, once the pidfd
> api is "complete" you should be able to set CONFIG_PROCFS=N - even
> though that's crazy - and still be able to use pidfds. This is also a
> point akpm asked about when I did the pidfd_send_signal work.
Oh, ok. Somehow 'proc' and 'pid' sound very similar in terminology so
naturally I felt the proc fs superblock would be a fit, but I see your point.
> So instead of going throught proc we should probably do what David has
> been doing in the mount API and come to rely on anone_inode. So
> something like:
>
> fd = anon_inode_getfd("pidfd", &pidfd_fops, file_priv_data, flags);
>
> and stash information such as pid namespace etc. in a pidfd struct or
> something that we then can stash file->private_data of the new file.
> This also lets us avoid all this open coding done here.
> Another advantage is that anon_inodes is its own kernel-internal
> filesystem.
Thanks for the suggestion! Agreed this is better and will do it this way then.
thanks,
- Joel
next prev parent reply other threads:[~2019-03-19 22:26 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-10 20:34 [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android Sultan Alsawaf
2019-03-10 21:03 ` Greg Kroah-Hartman
2019-03-10 21:26 ` Sultan Alsawaf
2019-03-11 16:32 ` Joel Fernandes
2019-03-11 16:37 ` Joel Fernandes
2019-03-11 17:43 ` Michal Hocko
2019-03-11 17:58 ` Sultan Alsawaf
2019-03-11 20:10 ` Suren Baghdasaryan
2019-03-11 20:46 ` Sultan Alsawaf
2019-03-11 21:11 ` Joel Fernandes
2019-03-11 21:46 ` Sultan Alsawaf
2019-03-11 22:15 ` Suren Baghdasaryan
2019-03-11 22:36 ` Sultan Alsawaf
2019-03-12 8:05 ` Michal Hocko
2019-03-12 14:36 ` Suren Baghdasaryan
2019-03-12 15:25 ` Matthew Wilcox
2019-03-12 15:33 ` Michal Hocko
2019-03-12 15:39 ` Michal Hocko
2019-03-12 16:37 ` Sultan Alsawaf
2019-03-12 16:48 ` Michal Hocko
2019-03-12 16:58 ` Michal Hocko
2019-03-12 17:15 ` Suren Baghdasaryan
2019-03-12 17:17 ` Tim Murray
2019-03-12 17:45 ` Sultan Alsawaf
2019-03-12 18:43 ` Tim Murray
2019-03-12 18:50 ` Christian Brauner
2019-03-14 17:47 ` Joel Fernandes
2019-03-14 20:49 ` Sultan Alsawaf
2019-03-15 2:54 ` Joel Fernandes
2019-03-15 3:43 ` Sultan Alsawaf
2019-03-15 3:16 ` Steven Rostedt
2019-03-15 3:45 ` Sultan Alsawaf
2019-03-15 4:36 ` Daniel Colascione
2019-03-15 13:36 ` Joel Fernandes
2019-03-15 15:56 ` Suren Baghdasaryan
2019-03-15 16:12 ` Daniel Colascione
2019-03-15 16:43 ` Steven Rostedt
2019-03-15 17:17 ` Daniel Colascione
2019-03-15 18:03 ` Christian Brauner
2019-03-15 18:13 ` Joel Fernandes
2019-03-15 18:24 ` Christian Brauner
2019-03-15 18:49 ` Joel Fernandes
2019-03-16 17:31 ` Suren Baghdasaryan
2019-03-16 18:00 ` Daniel Colascione
2019-03-16 18:57 ` Christian Brauner
2019-03-16 19:37 ` Suren Baghdasaryan
2019-03-17 1:53 ` Joel Fernandes
2019-03-17 11:42 ` Christian Brauner
2019-03-17 15:40 ` Daniel Colascione
2019-03-18 0:29 ` Christian Brauner
2019-03-18 23:50 ` Joel Fernandes
2019-03-19 22:14 ` Christian Brauner
2019-03-19 22:26 ` Joel Fernandes [this message]
2019-03-19 22:48 ` Daniel Colascione
2019-03-19 23:10 ` Christian Brauner
2019-03-20 1:52 ` Joel Fernandes
2019-03-20 2:42 ` pidfd design Daniel Colascione
2019-03-20 3:59 ` Christian Brauner
2019-03-20 7:02 ` Daniel Colascione
2019-03-20 11:33 ` Joel Fernandes
2019-03-20 18:26 ` Christian Brauner
2019-03-20 18:38 ` Daniel Colascione
2019-03-20 18:51 ` Christian Brauner
2019-03-20 18:58 ` Andy Lutomirski
2019-03-20 19:14 ` Christian Brauner
2019-03-20 19:40 ` Daniel Colascione
2019-03-21 17:02 ` Andy Lutomirski
2019-03-20 19:19 ` Joel Fernandes
2019-03-20 19:29 ` Daniel Colascione
2019-03-24 14:44 ` Serge E. Hallyn
2019-03-24 18:48 ` Joel Fernandes
2019-03-20 19:11 ` Joel Fernandes
2019-05-07 2:16 ` [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android Sultan Alsawaf
2019-05-07 7:04 ` Greg Kroah-Hartman
2019-05-07 7:27 ` Sultan Alsawaf
2019-05-07 7:43 ` Greg Kroah-Hartman
2019-05-07 8:12 ` Sultan Alsawaf
2019-05-07 10:58 ` Christian Brauner
2019-05-07 16:28 ` Suren Baghdasaryan
2019-05-07 16:38 ` Christian Brauner
2019-05-07 16:53 ` Sultan Alsawaf
2019-05-07 20:01 ` Suren Baghdasaryan
2019-05-07 18:46 ` Joel Fernandes
2019-05-07 17:17 ` Sultan Alsawaf
2019-05-07 17:29 ` Greg Kroah-Hartman
2019-05-07 11:09 ` Greg Kroah-Hartman
2019-05-07 12:26 ` Michal Hocko
2019-05-07 15:31 ` Oleg Nesterov
2019-05-07 16:35 ` Sultan Alsawaf
2019-05-09 15:56 ` Oleg Nesterov
2019-05-09 18:33 ` Sultan Alsawaf
2019-05-10 15:10 ` Oleg Nesterov
2019-05-13 16:45 ` Sultan Alsawaf
2019-05-14 16:44 ` Steven Rostedt
2019-05-14 17:31 ` Sultan Alsawaf
2019-05-15 14:58 ` Oleg Nesterov
2019-05-15 17:27 ` Sultan Alsawaf
2019-05-15 18:32 ` Steven Rostedt
2019-05-15 18:52 ` Sultan Alsawaf
2019-05-15 20:09 ` Steven Rostedt
2019-05-16 13:54 ` Oleg Nesterov
2019-03-17 16:35 ` Serge E. Hallyn
2019-03-17 17:11 ` Daniel Colascione
2019-03-17 17:16 ` Serge E. Hallyn
2019-03-17 22:02 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190319222652.GA105485@google.com \
--to=joel@joelfernandes.org \
--cc=arve@android.com \
--cc=christian@brauner.io \
--cc=dancol@google.com \
--cc=devel@driverdev.osuosl.org \
--cc=gregkh@linuxfoundation.org \
--cc=keescook@chromium.org \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=maco@android.com \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=serge@hallyn.com \
--cc=sultan@kerneltoast.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
--cc=tkjos@android.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).