From: Minchan Kim <minchan@kernel.org>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
linux-api@vger.kernel.org, oleksandr@redhat.com,
Suren Baghdasaryan <surenb@google.com>,
Tim Murray <timmurray@google.com>,
Daniel Colascione <dancol@google.com>,
Sandeep Patil <sspatil@google.com>,
Sonny Rao <sonnyrao@google.com>,
Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeelb@google.com>,
John Dias <joaodias@google.com>,
Joel Fernandes <joel@joelfernandes.org>,
Jann Horn <jannh@google.com>,
alexander.h.duyck@linux.intel.com,
SeongJae Park <sjpark@amazon.com>,
David Rientjes <rientjes@google.com>,
Arjun Roy <arjunroy@google.com>,
Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: Re: [PATCH] mm: use only pidfd for process_madvise syscall
Date: Tue, 19 May 2020 11:14:47 -0700 [thread overview]
Message-ID: <20200519181447.GA220547@google.com> (raw)
In-Reply-To: <20200519161956.GA66748@google.com>
On Tue, May 19, 2020 at 09:19:56AM -0700, Minchan Kim wrote:
> Hi Christian,
>
> On Tue, May 19, 2020 at 09:45:18AM +0200, Christian Brauner wrote:
> > On Fri, May 15, 2020 at 06:20:55PM -0700, Minchan Kim wrote:
> > > Based on discussion[1], people didn't feel we need to support both
> > > pid and pidfd for every new coming API[2] so this patch keeps only
> > > pidfd. This patch also changes flags's type with "unsigned int".
> > > So finally, the API is as follows,
> > >
> > > ssize_t process_madvise(int pidfd, const struct iovec *iovec,
> > > unsigned long vlen, int advice, unsigned int flags);
> > >
> > > DESCRIPTION
> > > The process_madvise() system call is used to give advice or directions
> > > to the kernel about the address ranges from external process as well as
> > > local process. It provides the advice to address ranges of process
> > > described by iovec and vlen. The goal of such advice is to improve system
> > > or application performance.
> > >
> > > The pidfd selects the process referred to by the PID file descriptor
> > > specified in pidfd. (See pidofd_open(2) for further information)
> > >
> > > The pointer iovec points to an array of iovec structures, defined in
> > > <sys/uio.h> as:
> > >
> > > struct iovec {
> > > void *iov_base; /* starting address */
> > > size_t iov_len; /* number of bytes to be advised */
> > > };
> > >
> > > The iovec describes address ranges beginning at address(iov_base)
> > > and with size length of bytes(iov_len).
> > >
> > > The vlen represents the number of elements in iovec.
> > >
> > > The advice is indicated in the advice argument, which is one of the
> > > following at this moment if the target process specified by idtype and
> > > id is external.
> > >
> > > MADV_COLD
> > > MADV_PAGEOUT
> > > MADV_MERGEABLE
> > > MADV_UNMERGEABLE
> > >
> > > Permission to provide a hint to external process is governed by a
> > > ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
> > >
> > > The process_madvise supports every advice madvise(2) has if target
> > > process is in same thread group with calling process so user could
> > > use process_madvise(2) to extend existing madvise(2) to support
> > > vector address ranges.
> > >
> > > RETURN VALUE
> > > On success, process_madvise() returns the number of bytes advised.
> > > This return value may be less than the total number of requested
> > > bytes, if an error occurred. The caller should check return value
> > > to determine whether a partial advice occurred.
> > >
> > > [1] https://lore.kernel.org/linux-mm/20200509124817.xmrvsrq3mla6b76k@wittgenstein/
> > > [2] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> >
> > Thanks for the ping Minchan, and sorry for not replying earlier to this.
> >
> > Also, sorry that i delayed the patch but this here really seems a way
> > cleaner api to me and feels less hackish. In general this patch seems
> > fine to me.
>
> Thanks.
>
> > But two comments below:
> >
> > > ---
> > > mm/madvise.c | 42 +++++++++++++-----------------------------
> > > 1 file changed, 13 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index d3fbbe52d230..35c9b220146a 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -1229,8 +1229,8 @@ static int process_madvise_vec(struct task_struct *target_task,
> > > return ret;
> > > }
> > >
> > > -static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter,
> > > - int behavior, unsigned long flags)
> > > +static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter,
> > > + int behavior, unsigned int flags)
> > > {
> > > ssize_t ret;
> > > struct pid *pid;
> > > @@ -1241,26 +1241,12 @@ static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter *iter,
> > > if (flags != 0)
> > > return -EINVAL;
> > >
> > > - switch (which) {
> > > - case P_PID:
> > > - if (upid <= 0)
> > > - return -EINVAL;
> > > -
> > > - pid = find_get_pid(upid);
> > > - if (!pid)
> > > - return -ESRCH;
> > > - break;
> > > - case P_PIDFD:
> > > - if (upid < 0)
> > > - return -EINVAL;
> > > -
> > > - pid = pidfd_get_pid(upid);
> > > - if (IS_ERR(pid))
> > > - return PTR_ERR(pid);
> > > - break;
> > > - default:
> > > + if (pidfd < 0)
> > > return -EINVAL;
> >
> > When garbage file descriptors are passed EBADF needs to be returned, not
> > EINVAL. That's the case with most apis and also with pidfds, compare:
>
> True. Let me cook a patch for that.
Hi Andrew,
Please fold this patch against on last patch you have for process_madvise.
Thanks!
From 939e4c5b7ca12efc5d5eeb8ff55fc02752f70544 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 19 May 2020 11:06:57 -0700
Subject: [PATCH] mm: return EBADF if pidfd is invalid
This patch makes returning of EBADF when the fd passed as argument is
invalid. The implementaion relies on pidfd_get_pid's error return.
This patch also fixes syscall declare part since we removed pid support.
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/compat.h | 6 +++---
include/linux/syscalls.h | 5 ++---
mm/madvise.c | 3 ---
3 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 2e2f0a2700ab..86b61e873947 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -827,10 +827,10 @@ asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
unsigned long vlen, loff_t pos, rwf_t flags);
#endif
-asmlinkage ssize_t compat_sys_process_madvise(compat_int_t which,
- compat_pid_t upid, const struct compat_iovec __user *vec,
+asmlinkage ssize_t compat_sys_process_madvise(compat_int_t pidfd,
+ const struct compat_iovec __user *vec,
compat_ulong_t vlen, compat_int_t behavior,
- compat_ulong_t flags);
+ compat_int_t flags);
/*
* Deprecated system calls which are still defined in
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 2cbd3660e8e6..63ffa6dc9da3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -879,9 +879,8 @@ asmlinkage long sys_munlockall(void);
asmlinkage long sys_mincore(unsigned long start, size_t len,
unsigned char __user * vec);
asmlinkage long sys_madvise(unsigned long start, size_t len, int behavior);
-asmlinkage long sys_process_madvise(int which, pid_t upid,
- const struct iovec __user *vec, unsigned long vlen,
- int behavior, unsigned long flags);
+asmlinkage long sys_process_madvise(int pidfd, const struct iovec __user *vec,
+ unsigned long vlen, int behavior, unsigned int flags);
asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
unsigned long prot, unsigned long pgoff,
unsigned long flags);
diff --git a/mm/madvise.c b/mm/madvise.c
index 35c9b220146a..3ac1eda1203f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1241,9 +1241,6 @@ static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter,
if (flags != 0)
return -EINVAL;
- if (pidfd < 0)
- return -EINVAL;
-
pid = pidfd_get_pid(pidfd);
if (IS_ERR(pid))
return PTR_ERR(pid);
--
2.26.2.761.g0e0b3e54be-goog
prev parent reply other threads:[~2020-05-19 18:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-16 1:20 [PATCH] mm: use only pidfd for process_madvise syscall Minchan Kim
2020-05-18 19:22 ` Suren Baghdasaryan
2020-05-18 21:13 ` Minchan Kim
2020-05-18 23:06 ` Andrew Morton
2020-05-19 5:54 ` Minchan Kim
2020-05-19 7:45 ` Christian Brauner
2020-05-19 16:19 ` Minchan Kim
2020-05-19 18:14 ` Minchan Kim [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200519181447.GA220547@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=arjunroy@google.com \
--cc=bgeffon@google.com \
--cc=christian.brauner@ubuntu.com \
--cc=dancol@google.com \
--cc=hannes@cmpxchg.org \
--cc=jannh@google.com \
--cc=joaodias@google.com \
--cc=joel@joelfernandes.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=oleksandr@redhat.com \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
--cc=sjpark@amazon.com \
--cc=sonnyrao@google.com \
--cc=sspatil@google.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).