Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Rich Felker @ 2015-01-09 21:28 UTC (permalink / raw)
  To: Al Viro
  Cc: David Drysdale, Michael Kerrisk (man-pages), Eric W. Biederman,
	Andy Lutomirski, Meredydd Luff, linux-kernel@vger.kernel.org,
	Andrew Morton, David Miller, Thomas Gleixner, Stephen Rothwell,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Christoph Hellwig, X86 ML, linux-arch, Linux API,
	sparclinux
In-Reply-To: <20150109210941.GL22149@ZenIV.linux.org.uk>

On Fri, Jan 09, 2015 at 09:09:41PM +0000, Al Viro wrote:
> On Fri, Jan 09, 2015 at 03:59:26PM -0500, Rich Felker wrote:
> 
> > > For fsck sake, folks, if you have bloody /proc, you don't need that shite
> > > at all!  Just do execve on /proc/self/fd/n, and be done with that.
> > > 
> > > The sole excuse for merging that thing in the first place had been
> > > "would anybody think of children^Wsclerotic^Whardened environments
> > > where they have no /proc at all".
> > 
> > That doesn't work. With O_CLOEXEC, /proc/self/fd/n is already gone at
> > the time the interpreter runs, whether you're using fexecveat or
> > execve with "/proc/self/fd/n" to implement POSIX fexecve(). That's the
> > problem. This breaks the intended idiom for fexecve.
> 
> Just what will your magical symlink do in case when the file is opened,
> unlinked and marked O_CLOEXEC?  When should actual freeing of disk blocks,
> etc. happen?  And no, you can't assume that interpreter will open the
> damn thing even once - there's nothing to oblige it to do so.

Unlinking is not relevant. Magical symlinks refer to open file
descriptions (either real ones or O_PATH inode-reference-only ones),
not files. There is no new complexity proposed for freeing disk blocks
here. Semantics are identical to existing O_PATH inode references.

> Al, more and more tempted to ask reverting the whole thing - this hardcoded
> /dev/fd/... (in fs/exec.c, no less) is disgraceful enough, but threats of
> even more revolting kludges in the name of "intended idiom for fexecve"...

If you have a multithreaded process that's executing an external
program via fexecve, then unless it has specialized knowledge about
what other parts of the program/libraries are doing, it needs to be
using O_CLOEXEC for the file descriptor. Otherwise, the file
descriptor could be leaked to child processes started by other
threads. This is what I mean by the "intended idiom". Note that it's
easier to use pathnames instead of fexecve, but doing so may not be an
option if the program needs to verify the file before exec'ing it.

This issue can be avoided if you're going to fork-and-fexecve rather
than replacing the calling process, since after forking it's safe to
remove the close-on-exec flag. But then you still have the issue that
the child process, after exec, keeps a spurious file descriptor to its
own process image (executable file) open which it can never close
(because it doesn't know the number). This could eventually lead to fd
exhaustion after many generations.

The "magic open-once magic symlink" approach is really the cleanest
solution I can find. In the case where the interpreter does not open
the script, nothing terribly bad happens; the magic symlink just
sticks around until _exit or exec. In the case where the interpreter
opens it more than once, you get a failure, but as far as I know
existing interpreters don't do this, and it's arguably bad design. In
any case it's a caught error.

Rich

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Eric W. Biederman @ 2015-01-09 21:20 UTC (permalink / raw)
  To: Rich Felker
  Cc: Al Viro, David Drysdale, Michael Kerrisk (man-pages),
	Andy Lutomirski, Meredydd Luff, linux-kernel@vger.kernel.org,
	Andrew Morton, David Miller, Thomas Gleixner, Stephen Rothwell,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Christoph Hellwig, X86 ML, linux-arch, Linux API,
	sparclinux
In-Reply-To: <20150109205926.GT4574@brightrain.aerifal.cx>

Rich Felker <dalias@aerifal.cx> writes:

> On Fri, Jan 09, 2015 at 08:56:26PM +0000, Al Viro wrote:
>> On Fri, Jan 09, 2015 at 03:48:15PM -0500, Rich Felker wrote:
>> > I think this is a case that needs to be fixed, though it's hard. The
>> > normal correct usage for fexecve is to always pass an O_CLOEXEC file
>> > descriptor, and the caller can't really be expected to know whether
>> > the file is a script or not. We discussed workarounds before and one
>> > idea I proposed was having fexecve provide a "one open only" magic
>> > symlink in /proc/self/ to pass to the interpreter. It would behave
>> > like an O_PATH file descriptor magic symlink in /proc/self/fd, but
>> > would automatically cease to exist on the first open (at which point
>> > the interpreter would have a real O_RDONLY file descriptor for the
>> > underlying file).
>> 
>> For fsck sake, folks, if you have bloody /proc, you don't need that shite
>> at all!  Just do execve on /proc/self/fd/n, and be done with that.
>> 
>> The sole excuse for merging that thing in the first place had been
>> "would anybody think of children^Wsclerotic^Whardened environments
>> where they have no /proc at all".
>
> That doesn't work. With O_CLOEXEC, /proc/self/fd/n is already gone at
> the time the interpreter runs, whether you're using fexecveat or
> execve with "/proc/self/fd/n" to implement POSIX fexecve(). That's the
> problem. This breaks the intended idiom for fexecve.

O_CLOEXEC with a #! intepreter can not work.  If the file descriptor is
closed a #! interpreter can not open it.   So I don't know why or how
you want that to work but it is nonsense.

This certainly does not break the intended usage for execveat.

Eric


^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Al Viro @ 2015-01-09 21:09 UTC (permalink / raw)
  To: Rich Felker
  Cc: David Drysdale, Michael Kerrisk (man-pages), Eric W. Biederman,
	Andy Lutomirski, Meredydd Luff,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrew Morton, David Miller, Thomas Gleixner, Stephen Rothwell,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Christoph Hellwig, X86 ML, linux-arch, Linux API,
	sparclinux-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150109205926.GT4574-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>

On Fri, Jan 09, 2015 at 03:59:26PM -0500, Rich Felker wrote:

> > For fsck sake, folks, if you have bloody /proc, you don't need that shite
> > at all!  Just do execve on /proc/self/fd/n, and be done with that.
> > 
> > The sole excuse for merging that thing in the first place had been
> > "would anybody think of children^Wsclerotic^Whardened environments
> > where they have no /proc at all".
> 
> That doesn't work. With O_CLOEXEC, /proc/self/fd/n is already gone at
> the time the interpreter runs, whether you're using fexecveat or
> execve with "/proc/self/fd/n" to implement POSIX fexecve(). That's the
> problem. This breaks the intended idiom for fexecve.

Just what will your magical symlink do in case when the file is opened,
unlinked and marked O_CLOEXEC?  When should actual freeing of disk blocks,
etc. happen?  And no, you can't assume that interpreter will open the
damn thing even once - there's nothing to oblige it to do so.

Al, more and more tempted to ask reverting the whole thing - this hardcoded
/dev/fd/... (in fs/exec.c, no less) is disgraceful enough, but threats of
even more revolting kludges in the name of "intended idiom for fexecve"...

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Rich Felker @ 2015-01-09 20:59 UTC (permalink / raw)
  To: Al Viro
  Cc: David Drysdale, Michael Kerrisk (man-pages), Eric W. Biederman,
	Andy Lutomirski, Meredydd Luff,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrew Morton, David Miller, Thomas Gleixner, Stephen Rothwell,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Christoph Hellwig, X86 ML, linux-arch, Linux API,
	sparclinux-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150109205626.GK22149-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

On Fri, Jan 09, 2015 at 08:56:26PM +0000, Al Viro wrote:
> On Fri, Jan 09, 2015 at 03:48:15PM -0500, Rich Felker wrote:
> > I think this is a case that needs to be fixed, though it's hard. The
> > normal correct usage for fexecve is to always pass an O_CLOEXEC file
> > descriptor, and the caller can't really be expected to know whether
> > the file is a script or not. We discussed workarounds before and one
> > idea I proposed was having fexecve provide a "one open only" magic
> > symlink in /proc/self/ to pass to the interpreter. It would behave
> > like an O_PATH file descriptor magic symlink in /proc/self/fd, but
> > would automatically cease to exist on the first open (at which point
> > the interpreter would have a real O_RDONLY file descriptor for the
> > underlying file).
> 
> For fsck sake, folks, if you have bloody /proc, you don't need that shite
> at all!  Just do execve on /proc/self/fd/n, and be done with that.
> 
> The sole excuse for merging that thing in the first place had been
> "would anybody think of children^Wsclerotic^Whardened environments
> where they have no /proc at all".

That doesn't work. With O_CLOEXEC, /proc/self/fd/n is already gone at
the time the interpreter runs, whether you're using fexecveat or
execve with "/proc/self/fd/n" to implement POSIX fexecve(). That's the
problem. This breaks the intended idiom for fexecve.

Rich

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Al Viro @ 2015-01-09 20:56 UTC (permalink / raw)
  To: Rich Felker
  Cc: David Drysdale, Michael Kerrisk (man-pages), Eric W. Biederman,
	Andy Lutomirski, Meredydd Luff, linux-kernel@vger.kernel.org,
	Andrew Morton, David Miller, Thomas Gleixner, Stephen Rothwell,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Christoph Hellwig, X86 ML, linux-arch, Linux API,
	sparclinux
In-Reply-To: <20150109204815.GR4574@brightrain.aerifal.cx>

On Fri, Jan 09, 2015 at 03:48:15PM -0500, Rich Felker wrote:
> I think this is a case that needs to be fixed, though it's hard. The
> normal correct usage for fexecve is to always pass an O_CLOEXEC file
> descriptor, and the caller can't really be expected to know whether
> the file is a script or not. We discussed workarounds before and one
> idea I proposed was having fexecve provide a "one open only" magic
> symlink in /proc/self/ to pass to the interpreter. It would behave
> like an O_PATH file descriptor magic symlink in /proc/self/fd, but
> would automatically cease to exist on the first open (at which point
> the interpreter would have a real O_RDONLY file descriptor for the
> underlying file).

For fsck sake, folks, if you have bloody /proc, you don't need that shite
at all!  Just do execve on /proc/self/fd/n, and be done with that.

The sole excuse for merging that thing in the first place had been
"would anybody think of children^Wsclerotic^Whardened environments
where they have no /proc at all".

Sheesh...

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Rich Felker @ 2015-01-09 20:48 UTC (permalink / raw)
  To: David Drysdale
  Cc: Michael Kerrisk (man-pages), Eric W. Biederman, Andy Lutomirski,
	Alexander Viro, Meredydd Luff,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrew Morton, David Miller, Thomas Gleixner, Stephen Rothwell,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Christoph Hellwig, X86 ML, linux-arch, Linux API,
	sparclinux-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAHse=S88Jy5ZKM_VY5onfvxX7dTMngnxuHfuLeSuzvKvQNP19A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Jan 09, 2015 at 05:46:28PM +0000, David Drysdale wrote:
> > It's AT_EXECFN,
> > /proc/self/exe, and filenames shown elsewhere in /proc that may be
> > derived in odd ways.
> >
> > I would also move the text about O_CLOEXEC to a BUGS or NOTES section
> > rather than the main description. The long-term intent should be that
> > script execution this way should work. IIRC this was discussed earlier
> > in the thread.
> 
> I may be misremembering, but I thought we hoped to be able to fix
> execveat of a script without /proc in future, but didn't expect to fix
> execveat of a script via an O_CLOEXEC fd (because in the latter
> case the fd gets closed before the script interpreter runs, so even
> if the interpreter (or a special filesystem) does clever things for names
> starting with "/dev/fd/..." the file descriptor is already gone).

I think this is a case that needs to be fixed, though it's hard. The
normal correct usage for fexecve is to always pass an O_CLOEXEC file
descriptor, and the caller can't really be expected to know whether
the file is a script or not. We discussed workarounds before and one
idea I proposed was having fexecve provide a "one open only" magic
symlink in /proc/self/ to pass to the interpreter. It would behave
like an O_PATH file descriptor magic symlink in /proc/self/fd, but
would automatically cease to exist on the first open (at which point
the interpreter would have a real O_RDONLY file descriptor for the
underlying file).

Rich

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: David Drysdale @ 2015-01-09 18:02 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff,
	linux-kernel@vger.kernel.org, Andrew Morton, David Miller,
	Thomas Gleixner, Stephen Rothwell, Oleg Nesterov, Ingo Molnar,
	H. Peter Anvin, Kees Cook, Arnd Bergmann, Rich Felker,
	Christoph Hellwig, X86 ML, linux-arch, Linux API, sparclinux
In-Reply-To: <54AFF813.7050604@gmail.com>

On Fri, Jan 9, 2015 at 3:47 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> On 11/24/2014 12:53 PM, David Drysdale wrote:
>> Signed-off-by: David Drysdale <drysdale@google.com>
>> ---
>>  man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 153 insertions(+)
>>  create mode 100644 man2/execveat.2
>
> David,
>
> Thanks for the very nicely prepared man page. I've done
> a few very light edits, and will release the version below
> with the next man-pages release.

Many thanks, one error (of mine) in 2 places pointed out below.


> .TH EXECVEAT 2 2015-01-09 "Linux" "Linux Programmer's Manual"
> .SH NAME
> execveat \- execute program relative to a directory file descriptor
> .SH SYNOPSIS
> .B #include <unistd.h>
> .sp
> .BI "int execveat(int " dirfd ", const char *" pathname ","
> .br
> .BI "             char *const " argv "[], char *const " envp "[],"
> .br
> .BI "             int " flags );
> .SH DESCRIPTION
> .\" commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
> The
> .BR execveat ()
> system call executes the program referred to by the combination of
> .I dirfd
> and
> .IR pathname .
> It operates in exactly the same way as
> .BR execve (2),
> except for the differences described in this manual page.
>
> If the pathname given in
> .I pathname
> is relative, then it is interpreted relative to the directory
> referred to by the file descriptor
> .I dirfd
> (rather than relative to the current working directory of
> the calling process, as is done by
> .BR execve (2)
> for a relative pathname).
>
> If
> .I pathname
> is relative and
> .I dirfd
> is the special value
> .BR AT_FDCWD ,
> then
> .I pathname
> is interpreted relative to the current working
> directory of the calling process (like
> .BR execve (2)).
>
> If
> .I pathname
> is absolute, then
> .I dirfd
> is ignored.
>
> If
> .I pathname
> is an empty string and the
> .BR AT_EMPTY_PATH
> flag is specified, then the file descriptor
> .I dirfd
> specifies the file to be executed (i.e.,
> .IR dirfd
> refers to an executable file, rather than a directory).
>
> The
> .I flags
> argument is a bit mask that can include zero or more of the following flags:
> .TP
> .BR AT_EMPTY_PATH
> If
> .I pathname
> is an empty string, operate on the file referred to by
> .IR dirfd
> (which may have been obtained using the
> .BR open (2)
> .B O_PATH
> flag).
> .TP
> .B AT_SYMLINK_NOFOLLOW
> If the file identified by
> .I dirfd
> and a non-NULL
> .I pathname
> is a symbolic link, then the call fails with the error
> .BR EINVAL .

Apologies, I think this should be ELOOP.

> .SH "RETURN VALUE"
> On success,
> .BR execveat ()
> does not return. On error \-1 is returned, and
> .I errno
> is set appropriately.
> .SH ERRORS
> The same errors that occur for
> .BR execve (2)
> can also occur for
> .BR execveat ().
> The following additional errors can occur for
> .BR execveat ():
> .TP
> .B EBADF
> .I dirfd
> is not a valid file descriptor.
> .TP
> .B EINVAL

ELOOP here too.

> .I flags
> includes
> .BR AT_SYMLINK_NOFOLLOW
> and the file identified by
> .I dirfd
> and a non-NULL
> .I pathname
> is a symbolic link.
> .TP
> .B EINVAL
> Invalid flag specified in
> .IR flags .
> .TP
> .B ENOENT
> The program identified by
> .I dirfd
> and
> .I pathname
> requires the use of an interpreter program
> (such as a script starting with "#!"), but the file descriptor
> .I dirfd
> was opened with the
> .B O_CLOEXEC
> flag, with the result that
> the program file is inaccessible to the launched interpreter.
> .TP
> .B ENOTDIR
> .I pathname
> is relative and
> .I dirfd
> is a file descriptor referring to a file other than a directory.
> .SH VERSIONS
> .BR execveat ()
> was added to Linux in kernel 3.19.
> GNU C library support is pending.
> .\" FIXME . check for glibc support in a future release
> .SH CONFORMING TO
> The
> .BR execveat ()
> system call is Linux-specific.
> .SH NOTES
> In addition to the reasons explained in
> .BR openat (2),
> the
> .BR execveat ()
> system call is also needed to allow
> .BR fexecve (3)
> to be implemented on systems that do not have the
> .I /proc
> filesystem mounted.
> .SH SEE ALSO
> .BR execve (2),
> .BR openat (2),
> .BR fexecve (3)
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: David Drysdale @ 2015-01-09 17:46 UTC (permalink / raw)
  To: Rich Felker
  Cc: Michael Kerrisk (man-pages), Eric W. Biederman, Andy Lutomirski,
	Alexander Viro, Meredydd Luff,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrew Morton, David Miller, Thomas Gleixner, Stephen Rothwell,
	Oleg Nesterov, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Christoph Hellwig, X86 ML, linux-arch, Linux API,
	sparclinux-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150109161302.GQ4574-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>

On Fri, Jan 9, 2015 at 4:13 PM, Rich Felker <dalias-/miJ2pyFWUyWIDz0JBNUog@public.gmane.org> wrote:
> On Fri, Jan 09, 2015 at 04:47:31PM +0100, Michael Kerrisk (man-pages) wrote:
>> On 11/24/2014 12:53 PM, David Drysdale wrote:
>> > Signed-off-by: David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> > ---
>> >  man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >  1 file changed, 153 insertions(+)
>> >  create mode 100644 man2/execveat.2
>>
>> David,
>>
>> Thanks for the very nicely prepared man page. I've done
>> a few very light edits, and will release the version below
>> with the next man-pages release.
>>
>> I have one question. In the message accompanying
>> commit 51f39a1f0cea1cacf8c787f652f26dfee9611874 you wrote:
>>
>>   The filename fed to the executed program as argv[0] (or the name of the
>>   script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
>>   (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
>>   reflecting how the executable was found.  This does however mean that
>>   execution of a script in a /proc-less environment won't work; also, script
>>   execution via an O_CLOEXEC file descriptor fails (as the file will not be
>>   accessible after exec).
>>
>> How does one produce this situation where the execed program sees
>> argv[0] as a /dev/fd path? (i.e., what would the execveat()
>> call look like?) I tried to produce this scenario, but could not.
>
> I think this is wrong. argv[0] is an arbitrary string provided by the
> caller and would never be derived from the fd passed.

Yeah, I think I just wrote that wrong, it's only relevant for scripts.
As Rich says, for normal binaries argv[0] is just the argv[0] that
was passed into the execve[at] call.  For a script, the code in
fs/binfmt_script.c will remove the original argv[0] and put the
interpreter name and the script filename (e.g. "/bin/sh",
"/dev/fd/6/script") in as 2 arguments in its place.

[As an aside, IIRC the filename does get put into the new
process's memory, up above the environment strings -- but
that copy isn't visible via argv nor envp.]

> It's AT_EXECFN,
> /proc/self/exe, and filenames shown elsewhere in /proc that may be
> derived in odd ways.
>
> I would also move the text about O_CLOEXEC to a BUGS or NOTES section
> rather than the main description. The long-term intent should be that
> script execution this way should work. IIRC this was discussed earlier
> in the thread.

I may be misremembering, but I thought we hoped to be able to fix
execveat of a script without /proc in future, but didn't expect to fix
execveat of a script via an O_CLOEXEC fd (because in the latter
case the fd gets closed before the script interpreter runs, so even
if the interpreter (or a special filesystem) does clever things for names
starting with "/dev/fd/..." the file descriptor is already gone).

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Rich Felker @ 2015-01-09 16:13 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: David Drysdale, Eric W. Biederman, Andy Lutomirski,
	Alexander Viro, Meredydd Luff, linux-kernel, Andrew Morton,
	David Miller, Thomas Gleixner, Stephen Rothwell, Oleg Nesterov,
	Ingo Molnar, H. Peter Anvin, Kees Cook, Arnd Bergmann,
	Christoph Hellwig, x86, linux-arch, linux-api, sparclinux
In-Reply-To: <54AFF813.7050604@gmail.com>

On Fri, Jan 09, 2015 at 04:47:31PM +0100, Michael Kerrisk (man-pages) wrote:
> On 11/24/2014 12:53 PM, David Drysdale wrote:
> > Signed-off-by: David Drysdale <drysdale@google.com>
> > ---
> >  man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 153 insertions(+)
> >  create mode 100644 man2/execveat.2
> 
> David,
> 
> Thanks for the very nicely prepared man page. I've done 
> a few very light edits, and will release the version below 
> with the next man-pages release.
> 
> I have one question. In the message accompanying
> commit 51f39a1f0cea1cacf8c787f652f26dfee9611874 you wrote:
> 
>   The filename fed to the executed program as argv[0] (or the name of the
>   script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
>   (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
>   reflecting how the executable was found.  This does however mean that
>   execution of a script in a /proc-less environment won't work; also, script
>   execution via an O_CLOEXEC file descriptor fails (as the file will not be
>   accessible after exec).
> 
> How does one produce this situation where the execed program sees 
> argv[0] as a /dev/fd path? (i.e., what would the execveat()
> call look like?) I tried to produce this scenario, but could not.

I think this is wrong. argv[0] is an arbitrary string provided by the
caller and would never be derived from the fd passed. It's AT_EXECFN,
/proc/self/exe, and filenames shown elsewhere in /proc that may be
derived in odd ways.

I would also move the text about O_CLOEXEC to a BUGS or NOTES section
rather than the main description. The long-term intent should be that
script execution this way should work. IIRC this was discussed earlier
in the thread.

Rich

^ permalink raw reply

* Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
From: Michael Kerrisk (man-pages) @ 2015-01-09 15:47 UTC (permalink / raw)
  To: David Drysdale, Eric W. Biederman, Andy Lutomirski,
	Alexander Viro, Meredydd Luff, linux-kernel, Andrew Morton,
	David Miller, Thomas Gleixner
  Cc: mtk.manpages, Stephen Rothwell, Oleg Nesterov, Ingo Molnar,
	H. Peter Anvin, Kees Cook, Arnd Bergmann, Rich Felker,
	Christoph Hellwig, x86, linux-arch, linux-api, sparclinux
In-Reply-To: <1416830039-21952-6-git-send-email-drysdale@google.com>

On 11/24/2014 12:53 PM, David Drysdale wrote:
> Signed-off-by: David Drysdale <drysdale@google.com>
> ---
>  man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 153 insertions(+)
>  create mode 100644 man2/execveat.2

David,

Thanks for the very nicely prepared man page. I've done 
a few very light edits, and will release the version below 
with the next man-pages release.

I have one question. In the message accompanying
commit 51f39a1f0cea1cacf8c787f652f26dfee9611874 you wrote:

  The filename fed to the executed program as argv[0] (or the name of the
  script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
  (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
  reflecting how the executable was found.  This does however mean that
  execution of a script in a /proc-less environment won't work; also, script
  execution via an O_CLOEXEC file descriptor fails (as the file will not be
  accessible after exec).

How does one produce this situation where the execed program sees 
argv[0] as a /dev/fd path? (i.e., what would the execveat()
call look like?) I tried to produce this scenario, but could not.

Cheers,

Michael

.\" Copyright (c) 2014 Google, Inc.
.\"
.\" %%%LICENSE_START(VERBATIM)
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date.  The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein.  The author(s) may not
.\" have taken the same level of care in the production of this manual,
.\" which is licensed free of charge, as they might when working
.\" professionally.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\" %%%LICENSE_END
.\"
.TH EXECVEAT 2 2015-01-09 "Linux" "Linux Programmer's Manual"
.SH NAME
execveat \- execute program relative to a directory file descriptor
.SH SYNOPSIS
.B #include <unistd.h>
.sp
.BI "int execveat(int " dirfd ", const char *" pathname ","
.br
.BI "             char *const " argv "[], char *const " envp "[],"
.br
.BI "             int " flags );
.SH DESCRIPTION
.\" commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
The
.BR execveat ()
system call executes the program referred to by the combination of
.I dirfd
and
.IR pathname .
It operates in exactly the same way as
.BR execve (2),
except for the differences described in this manual page.

If the pathname given in
.I pathname
is relative, then it is interpreted relative to the directory
referred to by the file descriptor
.I dirfd
(rather than relative to the current working directory of
the calling process, as is done by
.BR execve (2)
for a relative pathname).

If
.I pathname
is relative and
.I dirfd
is the special value
.BR AT_FDCWD ,
then
.I pathname
is interpreted relative to the current working
directory of the calling process (like
.BR execve (2)).

If
.I pathname
is absolute, then
.I dirfd
is ignored.

If
.I pathname
is an empty string and the
.BR AT_EMPTY_PATH
flag is specified, then the file descriptor
.I dirfd
specifies the file to be executed (i.e.,
.IR dirfd
refers to an executable file, rather than a directory).

The
.I flags
argument is a bit mask that can include zero or more of the following flags:
.TP
.BR AT_EMPTY_PATH
If
.I pathname
is an empty string, operate on the file referred to by
.IR dirfd
(which may have been obtained using the
.BR open (2)
.B O_PATH
flag).
.TP
.B AT_SYMLINK_NOFOLLOW
If the file identified by
.I dirfd
and a non-NULL
.I pathname
is a symbolic link, then the call fails with the error
.BR EINVAL .
.SH "RETURN VALUE"
On success,
.BR execveat ()
does not return. On error \-1 is returned, and
.I errno
is set appropriately.
.SH ERRORS
The same errors that occur for
.BR execve (2)
can also occur for
.BR execveat ().
The following additional errors can occur for
.BR execveat ():
.TP
.B EBADF
.I dirfd
is not a valid file descriptor.
.TP
.B EINVAL
.I flags
includes
.BR AT_SYMLINK_NOFOLLOW
and the file identified by
.I dirfd
and a non-NULL
.I pathname
is a symbolic link.
.TP
.B EINVAL
Invalid flag specified in
.IR flags .
.TP
.B ENOENT
The program identified by
.I dirfd
and
.I pathname
requires the use of an interpreter program
(such as a script starting with "#!"), but the file descriptor
.I dirfd
was opened with the
.B O_CLOEXEC
flag, with the result that
the program file is inaccessible to the launched interpreter.
.TP
.B ENOTDIR
.I pathname
is relative and
.I dirfd
is a file descriptor referring to a file other than a directory.
.SH VERSIONS
.BR execveat ()
was added to Linux in kernel 3.19.
GNU C library support is pending.
.\" FIXME . check for glibc support in a future release
.SH CONFORMING TO
The
.BR execveat ()
system call is Linux-specific.
.SH NOTES
In addition to the reasons explained in
.BR openat (2),
the
.BR execveat ()
system call is also needed to allow
.BR fexecve (3)
to be implemented on systems that do not have the
.I /proc
filesystem mounted.
.SH SEE ALSO
.BR execve (2),
.BR openat (2),
.BR fexecve (3)

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* Re: [PATCH RESEND v3] gpio: lib-sysfs: Add 'wakeup' attribute
From: Sören Brinkmann @ 2015-01-09 15:24 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Alexandre Courbot, linux-api, linux-kernel@vger.kernel.org,
	linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <CACRpkda5mAtves=PZibd6jdVP1ztM0EYA+cnw7Gs+mSVXTkqpw@mail.gmail.com>

On Fri, 2015-01-09 at 10:32AM +0100, Linus Walleij wrote:
> On Mon, Jan 5, 2015 at 7:16 PM, Soren Brinkmann
> <soren.brinkmann@xilinx.com> wrote:
> 
> > Add an attribute 'wakeup' to the GPIO sysfs interface which allows
> > marking/unmarking a GPIO as wake IRQ.
> > The file 'wakeup' is created in each exported GPIOs directory, if an IRQ
> > is associated with that GPIO and the irqchip implements set_wake().
> > Writing 'enabled' to that file will enable wake for that GPIO, while
> > writing 'disabled' will disable wake.
> > Reading that file will return either 'disabled' or 'enabled' depening on
> > the currently set flag for the GPIO's IRQ.
> >
> > Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> > Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
> > ---
> > v3:
> >  - add documentation
> > v2:
> >  - fix error path to unlock mutex before return
> 
> Finally applied for v3.20.
> 
> Sorry for being a bit conservative and picky at times...

Thank you.

	Sören

^ permalink raw reply

* Re: [v8 2/5] ext4: adds project ID support
From: Jan Kara @ 2015-01-09  9:47 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Jan Kara, Li Xi, Linux Filesystem Development List,
	ext4 development, linux-api, Theodore Ts'o, Al Viro,
	Christoph Hellwig, Dmitry Monakhov
In-Reply-To: <2EE35986-B398-432E-92D4-EEF875381319@dilger.ca>

On Thu 08-01-15 15:20:21, Andreas Dilger wrote:
> On Jan 8, 2015, at 1:26 AM, Jan Kara <jack@suse.cz> wrote:
> > On Tue 09-12-14 13:22:25, Li Xi wrote:
> >> This patch adds a new internal field of ext4 inode to save project
> >> identifier. Also a new flag EXT4_INODE_PROJINHERIT is added for
> >> inheriting project ID from parent directory.
> >  I have noticed one thing you apparently changed in v7 of the patch set.
> > See below.
> > 
> >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >> index 29c43e7..8bd1da9 100644
> >> --- a/fs/ext4/ext4.h
> >> +++ b/fs/ext4/ext4.h
> >> @@ -377,16 +377,18 @@ struct flex_groups {
> >> #define EXT4_EA_INODE_FL	        0x00200000 /* Inode used for large EA */
> >> #define EXT4_EOFBLOCKS_FL		0x00400000 /* Blocks allocated beyond EOF */
> >> #define EXT4_INLINE_DATA_FL		0x10000000 /* Inode has inline data. */
> >> +#define EXT4_PROJINHERIT_FL		FS_PROJINHERIT_FL /* Create with parents projid */
> >  How did FS_PROJINHERIT_FL get here? There used to be 0x20000000 in older
> > version of the patch set which is correct - this definition is defining
> > ext4 on-disk format. As such it is an ext4 specific flag and should be
> > definined to a fixed constant independed of any other filesystem. It seems
> > you are somewhat mixing what is an on-disk format flag value and what is a
> > flag value passed from userspace. These two may be different things and
> > you need to convert between the values when getting / setting flags...
> 
> Currently the EXT4_*_FL and FS_*_FL values are all identical, and there
> is no reason to change that before it is actually needed.  Since the
> FS_PROJINHERIT_FL is used via chattr/lsattr from userspace, this value
> must also be kept the same in the future to avoid API breakage, so there
> is no reason to worry about incompatibilities.
  Agreed. I was somewhat worried about having on-disk flag defined through
the external non-ext4 define but you are right that neither can really
change once we ship a kernel with it.

> See also the [v8 5/5] patch, which is changing the EXT4_*_FL values to
> use FS_*_FL constants, where applicable, so that it is more clear that
> these values need to be the same.
  OK, I've missed that. So if things will be consistent again, I'm fine
with the change.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH RESEND v3] gpio: lib-sysfs: Add 'wakeup' attribute
From: Linus Walleij @ 2015-01-09  9:32 UTC (permalink / raw)
  To: Soren Brinkmann
  Cc: Alexandre Courbot, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-gpio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1420481779-22841-1-git-send-email-soren.brinkmann-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>

On Mon, Jan 5, 2015 at 7:16 PM, Soren Brinkmann
<soren.brinkmann-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org> wrote:

> Add an attribute 'wakeup' to the GPIO sysfs interface which allows
> marking/unmarking a GPIO as wake IRQ.
> The file 'wakeup' is created in each exported GPIOs directory, if an IRQ
> is associated with that GPIO and the irqchip implements set_wake().
> Writing 'enabled' to that file will enable wake for that GPIO, while
> writing 'disabled' will disable wake.
> Reading that file will return either 'disabled' or 'enabled' depening on
> the currently set flag for the GPIO's IRQ.
>
> Signed-off-by: Soren Brinkmann <soren.brinkmann-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
> v3:
>  - add documentation
> v2:
>  - fix error path to unlock mutex before return

Finally applied for v3.20.

Sorry for being a bit conservative and picky at times...

Yours,
Linus Walleij

^ permalink raw reply

* [PATCH 6/6] selftests: Set CC using CROSS_COMPILE once in lib.mk
From: Michael Ellerman @ 2015-01-09  9:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, cov, dh.herrmann, hughd, bobby.prani, serge.hallyn,
	ebiederm, tim.bird, josh, koct9i, linux-kbuild, linux-api, netdev,
	shuahkh
In-Reply-To: <1420794375-31881-1-git-send-email-mpe@ellerman.id.au>

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 tools/testing/selftests/efivarfs/Makefile | 1 -
 tools/testing/selftests/exec/Makefile     | 1 -
 tools/testing/selftests/kcmp/Makefile     | 1 -
 tools/testing/selftests/lib.mk            | 6 ++++++
 tools/testing/selftests/net/Makefile      | 1 -
 tools/testing/selftests/powerpc/Makefile  | 3 +--
 tools/testing/selftests/size/Makefile     | 2 --
 tools/testing/selftests/vm/Makefile       | 1 -
 8 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/efivarfs/Makefile b/tools/testing/selftests/efivarfs/Makefile
index 3052d0bda24b..d683486a859b 100644
--- a/tools/testing/selftests/efivarfs/Makefile
+++ b/tools/testing/selftests/efivarfs/Makefile
@@ -1,4 +1,3 @@
-CC = $(CROSS_COMPILE)gcc
 CFLAGS = -Wall
 
 test_objs = open-unlink create-read
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index 886cabe307b1..4edb7d0da29b 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -1,4 +1,3 @@
-CC = $(CROSS_COMPILE)gcc
 CFLAGS = -Wall
 BINARIES = execveat
 DEPS = execveat.symlink execveat.denatured script subdir
diff --git a/tools/testing/selftests/kcmp/Makefile b/tools/testing/selftests/kcmp/Makefile
index 0eecd183058c..2ae7450a9a89 100644
--- a/tools/testing/selftests/kcmp/Makefile
+++ b/tools/testing/selftests/kcmp/Makefile
@@ -1,4 +1,3 @@
-CC := $(CROSS_COMPILE)$(CC)
 CFLAGS += -I../../../../usr/include/
 
 all: kcmp_test
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 7bd3dabe2846..abae16396c43 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,3 +1,9 @@
+# When we're called from kbuild $(CC) already contains $(CROSS_COMPILE), so
+# here we need to use "cc", otherwise we'll get $(CROSS_COMPILE) twice. The
+# only downside is it breaks someone overriding $(CC), but that's probably OK,
+# they can probably cope by changing their path.
+CC := $(CROSS_COMPILE)cc
+
 define RUN_TESTS
 	@for TEST in $(TEST_PROGS); do \
 		(./$$TEST && echo "selftests: $$TEST [PASS]") || echo "selftests: $$TEST [FAIL]"; \
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 6ba2ac7bbb0d..fac4782c51d8 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -1,6 +1,5 @@
 # Makefile for net selftests
 
-CC = $(CROSS_COMPILE)gcc
 CFLAGS = -Wall -O2 -g
 
 CFLAGS += -I../../../../usr/include/
diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
index d2d19db7eda7..af3882a01e33 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -8,10 +8,9 @@ ifeq ($(ARCH),powerpc)
 
 GIT_VERSION = $(shell git describe --always --long --dirty || echo "unknown")
 
-CC := $(CROSS_COMPILE)$(CC)
 CFLAGS := -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CURDIR) $(CFLAGS)
 
-export CC CFLAGS
+export CFLAGS
 
 TARGETS = pmu copyloops mm tm primitives
 
diff --git a/tools/testing/selftests/size/Makefile b/tools/testing/selftests/size/Makefile
index e4353d74ea6e..bbd0b5398b61 100644
--- a/tools/testing/selftests/size/Makefile
+++ b/tools/testing/selftests/size/Makefile
@@ -1,5 +1,3 @@
-CC = $(CROSS_COMPILE)gcc
-
 all: get_size
 
 get_size: get_size.c
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index c0462182ec37..0dd26947fc68 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,6 +1,5 @@
 # Makefile for vm selftests
 
-CC = $(CROSS_COMPILE)gcc
 CFLAGS = -Wall
 BINARIES = hugepage-mmap hugepage-shm map_hugetlb thuge-gen hugetlbfstest
 BINARIES += transhuge-stress
-- 
2.1.0

^ permalink raw reply related

* [PATCH 5/6] kbuild: Don't pass -rR to selftest makefiles
From: Michael Ellerman @ 2015-01-09  9:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, cov, dh.herrmann, hughd, bobby.prani, serge.hallyn,
	ebiederm, tim.bird, josh, koct9i, linux-kbuild, linux-api, netdev,
	shuahkh
In-Reply-To: <1420794375-31881-1-git-send-email-mpe@ellerman.id.au>

The makefiles under tools/testing/selftests are not real kbuild
makefiles, they are regular stand alone makefiles. As such they *do*
want all the standard implicit rules and variables defined.

So before calling those makefiles, filter -rR out of MAKEFLAGS.

Without this not all the selftests are built correctly when called via
the top-level Makefile.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index c186a928d8be..8bbd0bc7329c 100644
--- a/Makefile
+++ b/Makefile
@@ -1079,12 +1079,12 @@ INSTALL_SELFTESTS_PATH = $(abspath $(objtree)/selftests)
 
 PHONY += kselftest
 kselftest:
-	$(Q)$(MAKE) -C tools/testing/selftests run_tests
+	$(Q)$(MAKE) -C tools/testing/selftests MAKEFLAGS="$(filter-out rR,$(MAKEFLAGS))" run_tests
 
 # Kernel selftest install
 PHONY += kselftest_install
 kselftest_install:
-	$(Q)$(MAKE) -C tools/testing/selftests INSTALL_PATH=$(INSTALL_SELFTESTS_PATH) install
+	$(Q)$(MAKE) -C tools/testing/selftests MAKEFLAGS="$(filter-out rR,$(MAKEFLAGS))" INSTALL_PATH=$(INSTALL_SELFTESTS_PATH) install
 
 # ---------------------------------------------------------------------------
 # Modules
-- 
2.1.0


^ permalink raw reply related

* [PATCH 4/6] kbuild: add a new kselftest_install make target to install selftests
From: Michael Ellerman @ 2015-01-09  9:06 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mmarek-AlSwsSmVLrQ, gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, keescook-F7+t8E8rja9g9hUCZPvPmw,
	tranmanphong-Re5JQEeQqe8AvxtiuMwx3w, cov-sgV2jX0FEOL9JmXXK+q4OQ,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, hughd-hpIqsD4AKlfQT0dZR+AlfA,
	bobby.prani-Re5JQEeQqe8AvxtiuMwx3w,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, tim.bird-/MT0OVThwyLZJqsBc5GL+g,
	josh-iaAMLnmF4UmaiuxdJuQwMA, koct9i-Re5JQEeQqe8AvxtiuMwx3w,
	linux-kbuild-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	shuahkh-JPH+aEBZ4P+UEJcrhfAQsw
In-Reply-To: <1420794375-31881-1-git-send-email-mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>

Add a new make target to install kernel selftests. This new target will
build and install selftests.

The default is just $(objtree)/selftests. This is preferable to
something based on $(INSTALL_MOD_PATH) (which defaults to /), as it
allows a normal user to install the tests. This is similar to the
default behaviour of make headers_install.

Therefore the most basic usage is:

$ make kselftests_install
$ ./selftests/all.sh

To install elsewhere use:

$ make kselftests_install INSTALL_SELFTESTS_PATH=/some/where
$ /some/where/all.sh

Signed-off-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>
Signed-off-by: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
---
 Makefile                         | 13 ++++++++++++-
 tools/testing/selftests/Makefile |  1 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index eb4eca56843a..c186a928d8be 100644
--- a/Makefile
+++ b/Makefile
@@ -1072,12 +1072,20 @@ headers_check: headers_install
 	$(Q)$(MAKE) $(hdr-inst)=arch/$(hdr-arch)/include/uapi/asm $(hdr-dst) HDRCHECK=1
 
 # ---------------------------------------------------------------------------
-# Kernel selftest
+# Kernel selftest targets
+
+# Default base path for kselftest install
+INSTALL_SELFTESTS_PATH = $(abspath $(objtree)/selftests)
 
 PHONY += kselftest
 kselftest:
 	$(Q)$(MAKE) -C tools/testing/selftests run_tests
 
+# Kernel selftest install
+PHONY += kselftest_install
+kselftest_install:
+	$(Q)$(MAKE) -C tools/testing/selftests INSTALL_PATH=$(INSTALL_SELFTESTS_PATH) install
+
 # ---------------------------------------------------------------------------
 # Modules
 
@@ -1286,6 +1294,9 @@ help:
 	@echo  '                    Build, install, and boot kernel before'
 	@echo  '                    running kselftest on it'
 	@echo  ''
+	@echo  '  kselftest_install - Install selftests to INSTALL_SELFTESTS_PATH'
+	@echo  '                      default: $(INSTALL_SELFTESTS_PATH)'
+	@echo  ''
 	@echo  'Kernel packaging:'
 	@$(MAKE) $(build)=$(package-dir) help
 	@echo  ''
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 4f2849b5ff77..a2345f4512bb 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -62,6 +62,7 @@ ifdef INSTALL_PATH
 
 	@# Ask all targets to emit their test scripts
 	echo "#!/bin/bash\n\n" > $(ALL_SCRIPT)
+	echo "cd \$$(dirname \$$0)" >> $(ALL_SCRIPT)
 	echo "ROOT=\$$PWD\n" >> $(ALL_SCRIPT)
 
 	for TARGET in $(TARGETS); do \
-- 
2.1.0

^ permalink raw reply related

* [PATCH 3/6] selftests: Add install support for the powerpc tests
From: Michael Ellerman @ 2015-01-09  9:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, cov, dh.herrmann, hughd, bobby.prani, serge.hallyn,
	ebiederm, tim.bird, josh, koct9i, linux-kbuild, linux-api, netdev,
	shuahkh
In-Reply-To: <1420794375-31881-1-git-send-email-mpe@ellerman.id.au>

The bulk of the selftests are actually below the powerpc sub directory.

This adds support for installing them, when on a powerpc machine, or if
ARCH and CROSS_COMPILE are set appropriately.

This is a little more complicated because of the sub directory structure
under powerpc, but much of the common logic in lib.mk is still used. The
net effect of the patch is still a reduction in code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 tools/testing/selftests/powerpc/Makefile           | 19 ++++++++-
 tools/testing/selftests/powerpc/copyloops/Makefile | 15 +++----
 tools/testing/selftests/powerpc/mm/Makefile        | 15 +++----
 tools/testing/selftests/powerpc/pmu/Makefile       | 48 ++++++++++++----------
 tools/testing/selftests/powerpc/pmu/ebb/Makefile   | 13 +++---
 .../testing/selftests/powerpc/primitives/Makefile  | 15 +++----
 tools/testing/selftests/powerpc/tm/Makefile        | 15 +++----
 7 files changed, 68 insertions(+), 72 deletions(-)

diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
index f6ff90a76bd7..d2d19db7eda7 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -22,10 +22,25 @@ all: $(TARGETS)
 $(TARGETS):
 	$(MAKE) -k -C $@ all
 
-run_tests: all
+include ../lib.mk
+
+override define RUN_TESTS
 	@for TARGET in $(TARGETS); do \
 		$(MAKE) -C $$TARGET run_tests; \
 	done;
+endef
+
+override define INSTALL_RULE
+	@for TARGET in $(TARGETS); do \
+		$(MAKE) -C $$TARGET install; \
+	done;
+endef
+
+override define EMIT_TESTS
+	@for TARGET in $(TARGETS); do \
+		$(MAKE) -s -C $$TARGET emit_tests; \
+	done;
+endef
 
 clean:
 	@for TARGET in $(TARGETS); do \
@@ -36,4 +51,4 @@ clean:
 tags:
 	find . -name '*.c' -o -name '*.h' | xargs ctags
 
-.PHONY: all run_tests clean tags $(TARGETS)
+.PHONY: tags $(TARGETS)
diff --git a/tools/testing/selftests/powerpc/copyloops/Makefile b/tools/testing/selftests/powerpc/copyloops/Makefile
index 6f2d3be227f9..c05023514ce8 100644
--- a/tools/testing/selftests/powerpc/copyloops/Makefile
+++ b/tools/testing/selftests/powerpc/copyloops/Makefile
@@ -6,24 +6,19 @@ CFLAGS += -D SELFTEST
 # Use our CFLAGS for the implicit .S rule
 ASFLAGS = $(CFLAGS)
 
-PROGS := copyuser_64 copyuser_power7 memcpy_64 memcpy_power7
+TEST_PROGS := copyuser_64 copyuser_power7 memcpy_64 memcpy_power7
 EXTRA_SOURCES := validate.c ../harness.c
 
-all: $(PROGS)
+all: $(TEST_PROGS)
 
 copyuser_64:     CPPFLAGS += -D COPY_LOOP=test___copy_tofrom_user_base
 copyuser_power7: CPPFLAGS += -D COPY_LOOP=test___copy_tofrom_user_power7
 memcpy_64:       CPPFLAGS += -D COPY_LOOP=test_memcpy
 memcpy_power7:   CPPFLAGS += -D COPY_LOOP=test_memcpy_power7
 
-$(PROGS): $(EXTRA_SOURCES)
+$(TEST_PROGS): $(EXTRA_SOURCES)
 
-run_tests: all
-	@-for PROG in $(PROGS); do \
-		./$$PROG; \
-	done;
+include ../../lib.mk
 
 clean:
-	rm -f $(PROGS) *.o
-
-.PHONY: all run_tests clean
+	rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index 357ccbd6bad9..e3b5fa360520 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -1,18 +1,13 @@
 noarg:
 	$(MAKE) -C ../
 
-PROGS := hugetlb_vs_thp_test
+TEST_PROGS := hugetlb_vs_thp_test
 
-all: $(PROGS)
+all: $(TEST_PROGS)
 
-$(PROGS): ../harness.c
+$(TEST_PROGS): ../harness.c
 
-run_tests: all
-	@-for PROG in $(PROGS); do \
-		./$$PROG; \
-	done;
+include ../../lib.mk
 
 clean:
-	rm -f $(PROGS)
-
-.PHONY: all run_tests clean
+	rm -f $(TEST_PROGS)
diff --git a/tools/testing/selftests/powerpc/pmu/Makefile b/tools/testing/selftests/powerpc/pmu/Makefile
index c9f4263906a5..5a161175bbd4 100644
--- a/tools/testing/selftests/powerpc/pmu/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/Makefile
@@ -1,38 +1,42 @@
 noarg:
 	$(MAKE) -C ../
 
-PROGS := count_instructions l3_bank_test per_event_excludes
+TEST_PROGS := count_instructions l3_bank_test per_event_excludes
 EXTRA_SOURCES := ../harness.c event.c lib.c
 
-SUB_TARGETS = ebb
+all: $(TEST_PROGS) ebb
 
-all: $(PROGS) $(SUB_TARGETS)
-
-$(PROGS): $(EXTRA_SOURCES)
+$(TEST_PROGS): $(EXTRA_SOURCES)
 
 # loop.S can only be built 64-bit
 count_instructions: loop.S count_instructions.c $(EXTRA_SOURCES)
 	$(CC) $(CFLAGS) -m64 -o $@ $^
 
-run_tests: all sub_run_tests
-	@-for PROG in $(PROGS); do \
-		./$$PROG; \
-	done;
+include ../../lib.mk
 
-clean: sub_clean
-	rm -f $(PROGS) loop.o
+DEFAULT_RUN_TESTS := $(RUN_TESTS)
+override define RUN_TESTS
+	$(DEFAULT_RUN_TESTS)
+	$(MAKE) -C ebb run_tests
+endef
 
-$(SUB_TARGETS):
-	$(MAKE) -k -C $@ all
+DEFAULT_EMIT_TESTS := $(EMIT_TESTS)
+override define EMIT_TESTS
+	$(DEFAULT_EMIT_TESTS)
+	$(MAKE) -s -C ebb emit_tests
+endef
 
-sub_run_tests: all
-	@for TARGET in $(SUB_TARGETS); do \
-		$(MAKE) -C $$TARGET run_tests; \
-	done;
+DEFAULT_INSTALL := $(INSTALL_RULE)
+override define INSTALL_RULE
+	$(DEFAULT_INSTALL_RULE)
+	$(MAKE) -C ebb install
+endef
 
-sub_clean:
-	@for TARGET in $(SUB_TARGETS); do \
-		$(MAKE) -C $$TARGET clean; \
-	done;
+clean:
+	rm -f $(TEST_PROGS) loop.o
+	$(MAKE) -C ebb clean
+
+ebb:
+	$(MAKE) -k -C $@ all
 
-.PHONY: all run_tests clean sub_run_tests sub_clean $(SUB_TARGETS)
+.PHONY: all run_tests clean ebb
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/Makefile b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
index 3dc4332698cb..5cdc9dbf2b27 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
@@ -4,7 +4,7 @@ noarg:
 # The EBB handler is 64-bit code and everything links against it
 CFLAGS += -m64
 
-PROGS := reg_access_test event_attributes_test cycles_test	\
+TEST_PROGS := reg_access_test event_attributes_test cycles_test	\
 	 cycles_with_freeze_test pmc56_overflow_test		\
 	 ebb_vs_cpu_event_test cpu_event_vs_ebb_test		\
 	 cpu_event_pinned_vs_ebb_test task_event_vs_ebb_test	\
@@ -16,18 +16,15 @@ PROGS := reg_access_test event_attributes_test cycles_test	\
 	 lost_exception_test no_handler_test			\
 	 cycles_with_mmcr2_test
 
-all: $(PROGS)
+all: $(TEST_PROGS)
 
-$(PROGS): ../../harness.c ../event.c ../lib.c ebb.c ebb_handler.S trace.c busy_loop.S
+$(TEST_PROGS): ../../harness.c ../event.c ../lib.c ebb.c ebb_handler.S trace.c busy_loop.S
 
 instruction_count_test: ../loop.S
 
 lost_exception_test: ../lib.c
 
-run_tests: all
-	@-for PROG in $(PROGS); do \
-		./$$PROG; \
-	done;
+include ../../../lib.mk
 
 clean:
-	rm -f $(PROGS)
+	rm -f $(TEST_PROGS)
diff --git a/tools/testing/selftests/powerpc/primitives/Makefile b/tools/testing/selftests/powerpc/primitives/Makefile
index ea737ca01732..b68c6221d3d1 100644
--- a/tools/testing/selftests/powerpc/primitives/Makefile
+++ b/tools/testing/selftests/powerpc/primitives/Makefile
@@ -1,17 +1,12 @@
 CFLAGS += -I$(CURDIR)
 
-PROGS := load_unaligned_zeropad
+TEST_PROGS := load_unaligned_zeropad
 
-all: $(PROGS)
+all: $(TEST_PROGS)
 
-$(PROGS): ../harness.c
+$(TEST_PROGS): ../harness.c
 
-run_tests: all
-	@-for PROG in $(PROGS); do \
-		./$$PROG; \
-	done;
+include ../../lib.mk
 
 clean:
-	rm -f $(PROGS) *.o
-
-.PHONY: all run_tests clean
+	rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile
index 2cede239a074..34f2ec634b40 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,15 +1,10 @@
-PROGS := tm-resched-dscr
+TEST_PROGS := tm-resched-dscr
 
-all: $(PROGS)
+all: $(TEST_PROGS)
 
-$(PROGS): ../harness.c
+$(TEST_PROGS): ../harness.c
 
-run_tests: all
-	@-for PROG in $(PROGS); do \
-		./$$PROG; \
-	done;
+include ../../lib.mk
 
 clean:
-	rm -f $(PROGS) *.o
-
-.PHONY: all run_tests clean
+	rm -f $(TEST_PROGS) *.o
-- 
2.1.0

^ permalink raw reply related

* [PATCH 2/6] selftests: Add install target
From: Michael Ellerman @ 2015-01-09  9:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, cov, dh.herrmann, hughd, bobby.prani, serge.hallyn,
	ebiederm, tim.bird, josh, koct9i, linux-kbuild, linux-api, netdev,
	shuahkh
In-Reply-To: <1420794375-31881-1-git-send-email-mpe@ellerman.id.au>

This adds make install support to selftests. The basic usage is:

$ cd tools/testing/selftests
$ make install

That installs into tools/testing/selftests/install, which can then be
copied where ever necessary.

The install destination is also configurable using eg:

$ INSTALL_PATH=/mnt/selftests make install

The implementation uses two targets in the child makefiles. The first
"install" is expected to install all files into $(INSTALL_PATH).

The second, "emit_tests", is expected to emit the test instructions (ie.
bash script) on stdout. Separating this from install means the child
makefiles need no knowledge of the location of the test script.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 tools/testing/selftests/Makefile                | 32 +++++++++++++++++++++++++
 tools/testing/selftests/exec/Makefile           |  3 +++
 tools/testing/selftests/lib.mk                  | 23 +++++++++++++++++-
 tools/testing/selftests/memory-hotplug/Makefile |  2 ++
 tools/testing/selftests/mount/Makefile          |  2 ++
 tools/testing/selftests/mqueue/Makefile         |  7 ++++++
 tools/testing/selftests/net/Makefile            |  1 +
 tools/testing/selftests/sysctl/Makefile         |  1 +
 8 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 4e511221a0c1..4f2849b5ff77 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -47,7 +47,39 @@ clean_hotplug:
 		make -C $$TARGET clean; \
 	done;
 
+INSTALL_PATH ?= install
+INSTALL_PATH := $(abspath $(INSTALL_PATH))
+ALL_SCRIPT := $(INSTALL_PATH)/all.sh
+
+install:
+ifdef INSTALL_PATH
+	@# Ask all targets to install their files
+	mkdir -p $(INSTALL_PATH)
+	for TARGET in $(TARGETS); do \
+		mkdir -p $(INSTALL_PATH)/$$TARGET ; \
+		make -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install; \
+	done;
+
+	@# Ask all targets to emit their test scripts
+	echo "#!/bin/bash\n\n" > $(ALL_SCRIPT)
+	echo "ROOT=\$$PWD\n" >> $(ALL_SCRIPT)
+
+	for TARGET in $(TARGETS); do \
+		echo "echo ; echo Running tests in $$TARGET" >> $(ALL_SCRIPT); \
+		echo "echo ========================================" >> $(ALL_SCRIPT); \
+		echo "cd $$TARGET" >> $(ALL_SCRIPT); \
+		make -s -C $$TARGET emit_tests >> $(ALL_SCRIPT); \
+		echo "cd \$$ROOT\n" >> $(ALL_SCRIPT); \
+	done;
+
+	chmod u+x $(ALL_SCRIPT)
+else
+	$(error Error: set INSTALL_PATH to use install)
+endif
+
 clean:
 	for TARGET in $(TARGETS); do \
 		make -C $$TARGET clean; \
 	done;
+
+.PHONY: install
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index a0098daeb73d..886cabe307b1 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -19,8 +19,11 @@ execveat.denatured: execveat
 	$(CC) $(CFLAGS) -o $@ $^
 
 TEST_PROGS := execveat
+TEST_FILES := $(DEPS)
 
 include ../lib.mk
 
+override EMIT_TESTS := echo "mkdir -p subdir; (./execveat && echo \"selftests: execveat [PASS]\") || echo \"selftests: execveat [FAIL]\""
+
 clean:
 	rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved xxxxx*
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index b30c5a49cb61..7bd3dabe2846 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -7,4 +7,25 @@ endef
 run_tests: all
 	$(RUN_TESTS)
 
-.PHONY: run_tests all clean
+define INSTALL_RULE
+	mkdir -p $(INSTALL_PATH)
+	install -t $(INSTALL_PATH) $(TEST_PROGS) $(TEST_FILES)
+endef
+
+install: all
+ifdef INSTALL_PATH
+	$(INSTALL_RULE)
+else
+	$(error Error: set INSTALL_PATH to use install)
+endif
+
+define EMIT_TESTS
+	@for TEST in $(TEST_PROGS); do \
+		echo "(./$$TEST && echo \"selftests: $$TEST [PASS]\") || echo \"selftests: $$TEST [FAIL]\""; \
+	done;
+endef
+
+emit_tests:
+	$(EMIT_TESTS)
+
+.PHONY: run_tests all clean install emit_tests
diff --git a/tools/testing/selftests/memory-hotplug/Makefile b/tools/testing/selftests/memory-hotplug/Makefile
index 8f7dea66ecac..598a1f68f534 100644
--- a/tools/testing/selftests/memory-hotplug/Makefile
+++ b/tools/testing/selftests/memory-hotplug/Makefile
@@ -2,7 +2,9 @@ all:
 
 include ../lib.mk
 
+TEST_PROGS := on-off-test.sh
 override RUN_TESTS := ./on-off-test.sh -r 2 || echo "selftests: memory-hotplug [FAIL]"
+override EMIT_TESTS := echo "$(RUN_TESTS)"
 
 run_full_test:
 	@/bin/bash ./on-off-test.sh || echo "memory-hotplug selftests: [FAIL]"
diff --git a/tools/testing/selftests/mount/Makefile b/tools/testing/selftests/mount/Makefile
index 06931dfd3ef0..a5b367f032ba 100644
--- a/tools/testing/selftests/mount/Makefile
+++ b/tools/testing/selftests/mount/Makefile
@@ -7,7 +7,9 @@ unprivileged-remount-test: unprivileged-remount-test.c
 
 include ../lib.mk
 
+TEST_PROGS := unprivileged-remount-test
 override RUN_TESTS := if [ -f /proc/self/uid_map ] ; then ./unprivileged-remount-test ; fi
+override EMIT_TESTS := echo "$(RUN_TESTS)"
 
 clean:
 	rm -f unprivileged-remount-test
diff --git a/tools/testing/selftests/mqueue/Makefile b/tools/testing/selftests/mqueue/Makefile
index cbc300ef11bf..6ca7261b55dc 100644
--- a/tools/testing/selftests/mqueue/Makefile
+++ b/tools/testing/selftests/mqueue/Makefile
@@ -9,5 +9,12 @@ override define RUN_TESTS
 	@./mq_perf_tests || echo "selftests: mq_perf_tests [FAIL]"
 endef
 
+TEST_PROGS := mq_open_tests mq_perf_tests
+
+override define EMIT_TESTS
+	echo "./mq_open_tests /test1 || echo \"selftests: mq_open_tests [FAIL]\""
+	echo "./mq_perf_tests || echo \"selftests: mq_perf_tests [FAIL]\""
+endef
+
 clean:
 	rm -f mq_open_tests mq_perf_tests
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index fa8187ff15e6..6ba2ac7bbb0d 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -12,6 +12,7 @@ all: $(NET_PROGS)
 	$(CC) $(CFLAGS) -o $@ $^
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh
+TEST_FILES := $(NET_PROGS)
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/sysctl/Makefile b/tools/testing/selftests/sysctl/Makefile
index c9660f5ef9f9..b3c33e071f10 100644
--- a/tools/testing/selftests/sysctl/Makefile
+++ b/tools/testing/selftests/sysctl/Makefile
@@ -5,6 +5,7 @@
 all:
 
 TEST_PROGS := run_numerictests run_stringtests
+TEST_FILES := common_tests
 
 include ../lib.mk
 
-- 
2.1.0


^ permalink raw reply related

* [PATCH 1/6] selftests: Introduce minimal shared logic for running tests
From: Michael Ellerman @ 2015-01-09  9:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, cov, dh.herrmann, hughd, bobby.prani, serge.hallyn,
	ebiederm, tim.bird, josh, koct9i, linux-kbuild, linux-api, netdev,
	shuahkh

This adds a Make include file which most selftests can then include to
get the run_tests logic.

On its own this has the advantage of some reduction in repetition, and
also means the pass/fail message is defined in fewer places.

However the key advantage is it will allow us to implement install very
simply in a subsequent patch.

The default implementation just executes each program in $(TEST_PROGS).

We use a variable to hold the default implementation of $(RUN_TESTS)
because that gives us a clean way to override it if necessary, ie. using
override. The mount, memory-hotplug and mqueue tests use that to provide
a different implementation.

Tests are not run via /bin/bash, so if they are scripts they must be
executable, we add u+x to several.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 tools/testing/selftests/breakpoints/Makefile       |  5 +++--
 tools/testing/selftests/cpu-hotplug/Makefile       |  5 +++--
 tools/testing/selftests/cpu-hotplug/on-off-test.sh |  0
 tools/testing/selftests/efivarfs/Makefile          |  5 +++--
 tools/testing/selftests/efivarfs/efivarfs.sh       |  0
 tools/testing/selftests/exec/Makefile              |  5 +++--
 tools/testing/selftests/firmware/Makefile          | 20 ++------------------
 tools/testing/selftests/firmware/fw_filesystem.sh  |  0
 tools/testing/selftests/firmware/fw_userhelper.sh  |  0
 tools/testing/selftests/ftrace/Makefile            |  5 +++--
 tools/testing/selftests/ipc/Makefile               |  5 +++--
 tools/testing/selftests/kcmp/Makefile              |  5 +++--
 tools/testing/selftests/lib.mk                     | 10 ++++++++++
 tools/testing/selftests/memfd/Makefile             |  6 +++---
 tools/testing/selftests/memory-hotplug/Makefile    |  5 +++--
 tools/testing/selftests/mount/Makefile             |  8 ++------
 tools/testing/selftests/mqueue/Makefile            |  9 ++++++---
 tools/testing/selftests/net/Makefile               |  8 ++++----
 tools/testing/selftests/ptrace/Makefile            |  5 +++--
 tools/testing/selftests/size/Makefile              |  5 +++--
 tools/testing/selftests/sysctl/Makefile            | 11 ++---------
 tools/testing/selftests/timers/Makefile            |  5 +++--
 tools/testing/selftests/user/Makefile              |  5 +++--
 tools/testing/selftests/vm/Makefile                |  5 +++--
 24 files changed, 68 insertions(+), 69 deletions(-)
 mode change 100644 => 100755 tools/testing/selftests/cpu-hotplug/on-off-test.sh
 mode change 100644 => 100755 tools/testing/selftests/efivarfs/efivarfs.sh
 mode change 100644 => 100755 tools/testing/selftests/firmware/fw_filesystem.sh
 mode change 100644 => 100755 tools/testing/selftests/firmware/fw_userhelper.sh
 create mode 100644 tools/testing/selftests/lib.mk

diff --git a/tools/testing/selftests/breakpoints/Makefile b/tools/testing/selftests/breakpoints/Makefile
index e18b42b254af..182235640209 100644
--- a/tools/testing/selftests/breakpoints/Makefile
+++ b/tools/testing/selftests/breakpoints/Makefile
@@ -16,8 +16,9 @@ else
 	echo "Not an x86 target, can't build breakpoints selftests"
 endif
 
-run_tests:
-	@./breakpoint_test || echo "breakpoints selftests: [FAIL]"
+TEST_PROGS := breakpoint_test
+
+include ../lib.mk
 
 clean:
 	rm -fr breakpoint_test
diff --git a/tools/testing/selftests/cpu-hotplug/Makefile b/tools/testing/selftests/cpu-hotplug/Makefile
index e9c28d8dc84b..15f02591d22c 100644
--- a/tools/testing/selftests/cpu-hotplug/Makefile
+++ b/tools/testing/selftests/cpu-hotplug/Makefile
@@ -1,7 +1,8 @@
 all:
 
-run_tests:
-	@/bin/bash ./on-off-test.sh || echo "cpu-hotplug selftests: [FAIL]"
+TEST_PROGS := on-off-test.sh
+
+include ../lib.mk
 
 run_full_test:
 	@/bin/bash ./on-off-test.sh -a || echo "cpu-hotplug selftests: [FAIL]"
diff --git a/tools/testing/selftests/cpu-hotplug/on-off-test.sh b/tools/testing/selftests/cpu-hotplug/on-off-test.sh
old mode 100644
new mode 100755
diff --git a/tools/testing/selftests/efivarfs/Makefile b/tools/testing/selftests/efivarfs/Makefile
index 29e8c6bc81b0..3052d0bda24b 100644
--- a/tools/testing/selftests/efivarfs/Makefile
+++ b/tools/testing/selftests/efivarfs/Makefile
@@ -5,8 +5,9 @@ test_objs = open-unlink create-read
 
 all: $(test_objs)
 
-run_tests: all
-	@/bin/bash ./efivarfs.sh || echo "efivarfs selftests: [FAIL]"
+TEST_PROGS := efivarfs.sh
+
+include ../lib.mk
 
 clean:
 	rm -f $(test_objs)
diff --git a/tools/testing/selftests/efivarfs/efivarfs.sh b/tools/testing/selftests/efivarfs/efivarfs.sh
old mode 100644
new mode 100755
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index 66dfc2ce1788..a0098daeb73d 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -18,8 +18,9 @@ execveat.denatured: execveat
 %: %.c
 	$(CC) $(CFLAGS) -o $@ $^
 
-run_tests: all
-	./execveat
+TEST_PROGS := execveat
+
+include ../lib.mk
 
 clean:
 	rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved xxxxx*
diff --git a/tools/testing/selftests/firmware/Makefile b/tools/testing/selftests/firmware/Makefile
index e23cce0bbc3a..9bf82234855b 100644
--- a/tools/testing/selftests/firmware/Makefile
+++ b/tools/testing/selftests/firmware/Makefile
@@ -3,25 +3,9 @@
 # No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
 all:
 
-fw_filesystem:
-	@if /bin/sh ./fw_filesystem.sh ; then \
-                echo "fw_filesystem: ok"; \
-        else \
-                echo "fw_filesystem: [FAIL]"; \
-                exit 1; \
-        fi
+TEST_PROGS := fw_filesystem.sh fw_userhelper.sh
 
-fw_userhelper:
-	@if /bin/sh ./fw_userhelper.sh ; then \
-                echo "fw_userhelper: ok"; \
-        else \
-                echo "fw_userhelper: [FAIL]"; \
-                exit 1; \
-        fi
-
-run_tests: all fw_filesystem fw_userhelper
+include ../lib.mk
 
 # Nothing to clean up.
 clean:
-
-.PHONY: all clean run_tests fw_filesystem fw_userhelper
diff --git a/tools/testing/selftests/firmware/fw_filesystem.sh b/tools/testing/selftests/firmware/fw_filesystem.sh
old mode 100644
new mode 100755
diff --git a/tools/testing/selftests/firmware/fw_userhelper.sh b/tools/testing/selftests/firmware/fw_userhelper.sh
old mode 100644
new mode 100755
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index 76cc9f156267..346720639d1d 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,8 @@
 all:
 
-run_tests:
-	@/bin/sh ./ftracetest || echo "ftrace selftests: [FAIL]"
+TEST_PROGS := ftracetest
+
+include ../lib.mk
 
 clean:
 	rm -rf logs/*
diff --git a/tools/testing/selftests/ipc/Makefile b/tools/testing/selftests/ipc/Makefile
index 74bbefdeaf4c..3b6013da4f59 100644
--- a/tools/testing/selftests/ipc/Makefile
+++ b/tools/testing/selftests/ipc/Makefile
@@ -18,8 +18,9 @@ else
 	echo "Not an x86 target, can't build msgque selftest"
 endif
 
-run_tests: all
-	./msgque_test
+TEST_PROGS := msgque_test
+
+include ../lib.mk
 
 clean:
 	rm -fr ./msgque_test
diff --git a/tools/testing/selftests/kcmp/Makefile b/tools/testing/selftests/kcmp/Makefile
index ff0eefdc6ceb..0eecd183058c 100644
--- a/tools/testing/selftests/kcmp/Makefile
+++ b/tools/testing/selftests/kcmp/Makefile
@@ -3,8 +3,9 @@ CFLAGS += -I../../../../usr/include/
 
 all: kcmp_test
 
-run_tests: all
-	@./kcmp_test || echo "kcmp_test: [FAIL]"
+TEST_PROGS := kcmp_test
+
+include ../lib.mk
 
 clean:
 	$(RM) kcmp_test kcmp-test-file
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
new file mode 100644
index 000000000000..b30c5a49cb61
--- /dev/null
+++ b/tools/testing/selftests/lib.mk
@@ -0,0 +1,10 @@
+define RUN_TESTS
+	@for TEST in $(TEST_PROGS); do \
+		(./$$TEST && echo "selftests: $$TEST [PASS]") || echo "selftests: $$TEST [FAIL]"; \
+	done;
+endef
+
+run_tests: all
+	$(RUN_TESTS)
+
+.PHONY: run_tests all clean
diff --git a/tools/testing/selftests/memfd/Makefile b/tools/testing/selftests/memfd/Makefile
index b80cd10d53ba..191dee9d6fd3 100644
--- a/tools/testing/selftests/memfd/Makefile
+++ b/tools/testing/selftests/memfd/Makefile
@@ -5,9 +5,9 @@ CFLAGS += -I../../../../include/
 all:
 	gcc $(CFLAGS) memfd_test.c -o memfd_test
 
-run_tests: all
-	gcc $(CFLAGS) memfd_test.c -o memfd_test
-	@./memfd_test || echo "memfd_test: [FAIL]"
+TEST_PROGS := memfd_test
+
+include ../lib.mk
 
 build_fuse:
 	gcc $(CFLAGS) fuse_mnt.c `pkg-config fuse --cflags --libs` -o fuse_mnt
diff --git a/tools/testing/selftests/memory-hotplug/Makefile b/tools/testing/selftests/memory-hotplug/Makefile
index d46b8d489cd2..8f7dea66ecac 100644
--- a/tools/testing/selftests/memory-hotplug/Makefile
+++ b/tools/testing/selftests/memory-hotplug/Makefile
@@ -1,7 +1,8 @@
 all:
 
-run_tests:
-	@/bin/bash ./on-off-test.sh -r 2 || echo "memory-hotplug selftests: [FAIL]"
+include ../lib.mk
+
+override RUN_TESTS := ./on-off-test.sh -r 2 || echo "selftests: memory-hotplug [FAIL]"
 
 run_full_test:
 	@/bin/bash ./on-off-test.sh || echo "memory-hotplug selftests: [FAIL]"
diff --git a/tools/testing/selftests/mount/Makefile b/tools/testing/selftests/mount/Makefile
index 337d853c2b72..06931dfd3ef0 100644
--- a/tools/testing/selftests/mount/Makefile
+++ b/tools/testing/selftests/mount/Makefile
@@ -5,13 +5,9 @@ all: unprivileged-remount-test
 unprivileged-remount-test: unprivileged-remount-test.c
 	gcc -Wall -O2 unprivileged-remount-test.c -o unprivileged-remount-test
 
-# Allow specific tests to be selected.
-test_unprivileged_remount: unprivileged-remount-test
-	@if [ -f /proc/self/uid_map ] ; then ./unprivileged-remount-test ; fi
+include ../lib.mk
 
-run_tests: all test_unprivileged_remount
+override RUN_TESTS := if [ -f /proc/self/uid_map ] ; then ./unprivileged-remount-test ; fi
 
 clean:
 	rm -f unprivileged-remount-test
-
-.PHONY: all test_unprivileged_remount
diff --git a/tools/testing/selftests/mqueue/Makefile b/tools/testing/selftests/mqueue/Makefile
index 8056e2e68fa4..cbc300ef11bf 100644
--- a/tools/testing/selftests/mqueue/Makefile
+++ b/tools/testing/selftests/mqueue/Makefile
@@ -2,9 +2,12 @@ all:
 	gcc -O2 mq_open_tests.c -o mq_open_tests -lrt
 	gcc -O2 -o mq_perf_tests mq_perf_tests.c -lrt -lpthread -lpopt
 
-run_tests:
-	@./mq_open_tests /test1 || echo "mq_open_tests: [FAIL]"
-	@./mq_perf_tests || echo "mq_perf_tests: [FAIL]"
+include ../lib.mk
+
+override define RUN_TESTS
+	@./mq_open_tests /test1 || echo "selftests: mq_open_tests [FAIL]"
+	@./mq_perf_tests || echo "selftests: mq_perf_tests [FAIL]"
+endef
 
 clean:
 	rm -f mq_open_tests mq_perf_tests
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 62f22cc9941c..fa8187ff15e6 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -11,9 +11,9 @@ all: $(NET_PROGS)
 %: %.c
 	$(CC) $(CFLAGS) -o $@ $^
 
-run_tests: all
-	@/bin/sh ./run_netsocktests || echo "sockettests: [FAIL]"
-	@/bin/sh ./run_afpackettests || echo "afpackettests: [FAIL]"
-	./test_bpf.sh
+TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh
+
+include ../lib.mk
+
 clean:
 	$(RM) $(NET_PROGS)
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 47ae2d385ce8..453927fea90c 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -6,5 +6,6 @@ all: peeksiginfo
 clean:
 	rm -f peeksiginfo
 
-run_tests: all
-	@./peeksiginfo || echo "peeksiginfo selftests: [FAIL]"
+TEST_PROGS := peeksiginfo
+
+include ../lib.mk
diff --git a/tools/testing/selftests/size/Makefile b/tools/testing/selftests/size/Makefile
index 04dc25e4fa92..e4353d74ea6e 100644
--- a/tools/testing/selftests/size/Makefile
+++ b/tools/testing/selftests/size/Makefile
@@ -5,8 +5,9 @@ all: get_size
 get_size: get_size.c
 	$(CC) -static -ffreestanding -nostartfiles -s $< -o $@
 
-run_tests: all
-	./get_size
+TEST_PROGS := get_size
+
+include ../lib.mk
 
 clean:
 	$(RM) get_size
diff --git a/tools/testing/selftests/sysctl/Makefile b/tools/testing/selftests/sysctl/Makefile
index 0a92adaf0865..c9660f5ef9f9 100644
--- a/tools/testing/selftests/sysctl/Makefile
+++ b/tools/testing/selftests/sysctl/Makefile
@@ -4,16 +4,9 @@
 # No binaries, but make sure arg-less "make" doesn't trigger "run_tests".
 all:
 
-# Allow specific tests to be selected.
-test_num:
-	@/bin/sh ./run_numerictests
+TEST_PROGS := run_numerictests run_stringtests
 
-test_string:
-	@/bin/sh ./run_stringtests
-
-run_tests: all test_num test_string
+include ../lib.mk
 
 # Nothing to clean up.
 clean:
-
-.PHONY: all run_tests clean test_num test_string
diff --git a/tools/testing/selftests/timers/Makefile b/tools/testing/selftests/timers/Makefile
index eb2859f4ad21..149cee3b7b8a 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -1,8 +1,9 @@
 all:
 	gcc posix_timers.c -o posix_timers -lrt
 
-run_tests: all
-	./posix_timers
+TEST_PROGS := posix_timers
+
+include ../lib.mk
 
 clean:
 	rm -f ./posix_timers
diff --git a/tools/testing/selftests/user/Makefile b/tools/testing/selftests/user/Makefile
index 12c9d15bab07..d401b63c5b1a 100644
--- a/tools/testing/selftests/user/Makefile
+++ b/tools/testing/selftests/user/Makefile
@@ -3,5 +3,6 @@
 # No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
 all:
 
-run_tests: all
-	./test_user_copy.sh
+TEST_PROGS := test_user_copy.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 4c4b1f631ecf..c0462182ec37 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -9,8 +9,9 @@ all: $(BINARIES)
 %: %.c
 	$(CC) $(CFLAGS) -o $@ $^
 
-run_tests: all
-	@/bin/sh ./run_vmtests || (echo "vmtests: [FAIL]"; exit 1)
+TEST_PROGS := run_vmtests
+
+include ../lib.mk
 
 clean:
 	$(RM) $(BINARIES)
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
From: Josh Triplett @ 2015-01-09  5:21 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Andy Lutomirski, Miklos Szeredi, linux-kernel@vger.kernel.org,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu
In-Reply-To: <20150109044908.GA10966@fam-t430.nay.redhat.com>

On Fri, Jan 09, 2015 at 12:49:08PM +0800, Fam Zheng wrote:
> On Thu, 01/08 18:24, Andy Lutomirski wrote:
> > On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> > >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> > >> >> I'd like to see a more ambitious change, since the timer isn't the
> > >> >> only problem like this.  Specifically, I'd like a syscall that does a
> > >> >> list of epoll-related things and then waits.  The list of things could
> > >> >> include, at least:
> > >> >>
> > >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> > >> >> want to turn on and off their requests for events on a somewhat
> > >> >> regular basis.
> > >> >
> > >> > This sounds good to me.
> > >> >
> > >> >>
> > >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> > >> >> adjust *both* monotonic and real-time wakeups.
> > >> >
> > >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> > >>
> > >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> > >
> > > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > > for each poll doesn't sound a common pattern to me.
> > 
> > Setting a timeout is definitely a common pattern, hence this thread.
> > But the current timeout interface sucks, and people should really use
> > absolute time.  (My epoll software uses absolute time.)  But then
> > users need to decide whether to have their timeout based on the
> > monotonic clock or the realtime clock (or something else entirely).
> > Some bigger programs may want both -- they may have internal events
> > queued for certain times and for certain timeouts, and those should
> > use realtime and monotonic respectively.  Heck, users may also want
> > separate slack values on those.
> > 
> > Timerfd is the only thing we have right now that is anywhere near
> > flexible enough.  Obviously if epoll became fancy enough, then we
> > could do away with the timerfd entirely here.
> > 
> > >
> > >>
> > >> >
> > >> >>
> > >> >> Would this make sense?  It could look like:
> > >> >>
> > >> >> int epoll_mod_and_pwait(int epfd,
> > >> >>   struct epoll_event *events, int maxevents,
> > >> >>   struct epoll_command *commands, int ncommands,
> > >> >>   const sigset_t *sigmask);
> > >> >
> > >> > What about flags?
> > >> >
> > >>
> > >> No room.  Maybe it should just be a struct for everything instead of
> > >> separate args.
> > >
> > > Also no room for timeout. A single struct sounds the only way to go.
> > 
> > That's what timerfd is for.  I think it would be a bit weird to
> > support "timeout" and detailed timerfd control.
> 
> I see what you mean. Thanks.
> 
> I still don't like hooking timerfd in the interface. Besides the unclean
> interface, it also feels cubersome and overkill to let users setup and add a
> dedicated timerfd to implement timeout.
> 
> How about this:
> 
> int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);
> 
> struct epoll_mod_wait_data {
> 	struct epoll_event *events;
> 	int maxevents;
> 	struct epoll_mod_cmd {
> 		int op,
> 		int fd;
> 		void *data;
> 	} *cmds;
> 	int ncmds;
> 	int flags;
> 	sigset_t *sigmask;
> };
> 
> Commands ops are:
> 
> 	EPOLL_CTL_ADD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_MOD
> 		@fd is the fd to modify; @data is epoll_event.
> 	EPOLL_CTL_DEL
> 		@fd is the fd to modify; @data is epoll_event.
> 
> 	EPOLL_CTL_SET_TIMEOUT
> 		@fd is ignored, @data is timespec.
> 		Clock type and relative/absolute are selected by flags as below.
> 
> Flags are given to override timeout defaults:
> 	EPOLL_FL_MONOTONIC_CLOCK
> 		If set, don't use realtime clock, use monotonic clock.
> 	EPOLL_FL_ABSOLUTE_TIMEOUT
> 		If set, don't use relative timeout, use absolute timeout.

I'd suggest using an "int clockid" field instead, like timerfd_settime;
even if it only accepts CLOCK_REALTIME and CLOCK_MONOTONIC, if it needs
extending in the future, it'd be painful to have to remap new CLOCK_*
constants into the EPOLL_FL_* namespace.  (I do think dropping timeouts
in favor of timerfds makes things more nicely orthogonal, but epoll_wait
already has a timeout parameter, so *shrug*.)

Also, I think that structure has too many levels of indirection; it'd
produce many unnecessary cache misses; considering you're trying to
eliminate the overhead of one or two extra syscalls, you don't want to
introduce a pile of unnecessary cache misses in the processes.  I'd
suggest inlining cmds as an array at the end of the structure, and
turning "void *data" into an inline epoll_event.  (Or, you could use
"events" as an in/out parameter.)

You could drop EPOLL_CTL_SET_TIMEOUT, and just include a clockid and
timespec directly in the top-level structure.

And I'd suggest either making flags a top-level parameter or putting it
at the start of the structure, to make future extension easier.

</bikeshed>

- Josh Triplett

^ permalink raw reply

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
From: Fam Zheng @ 2015-01-09  4:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers
In-Reply-To: <CALCETrU+MKWCh1FBqvuf3MGo3dTcmdktcM4aVZZGGUAmDNoWtg@mail.gmail.com>

On Thu, 01/08 18:24, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Thu, 01/08 17:28, Andy Lutomirski wrote:
> >> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> >> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> >> >> I'd like to see a more ambitious change, since the timer isn't the
> >> >> only problem like this.  Specifically, I'd like a syscall that does a
> >> >> list of epoll-related things and then waits.  The list of things could
> >> >> include, at least:
> >> >>
> >> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> >> >> want to turn on and off their requests for events on a somewhat
> >> >> regular basis.
> >> >
> >> > This sounds good to me.
> >> >
> >> >>
> >> >>  - timerfd_settime actions: this allows a single syscall to wait and
> >> >> adjust *both* monotonic and real-time wakeups.
> >> >
> >> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> >>
> >> Yes.  It's not very elegant, and more elegant ideas are welcome.
> >
> > What is the purpose of embedding timerfd operation here? Modifying timerfd
> > for each poll doesn't sound a common pattern to me.
> 
> Setting a timeout is definitely a common pattern, hence this thread.
> But the current timeout interface sucks, and people should really use
> absolute time.  (My epoll software uses absolute time.)  But then
> users need to decide whether to have their timeout based on the
> monotonic clock or the realtime clock (or something else entirely).
> Some bigger programs may want both -- they may have internal events
> queued for certain times and for certain timeouts, and those should
> use realtime and monotonic respectively.  Heck, users may also want
> separate slack values on those.
> 
> Timerfd is the only thing we have right now that is anywhere near
> flexible enough.  Obviously if epoll became fancy enough, then we
> could do away with the timerfd entirely here.
> 
> >
> >>
> >> >
> >> >>
> >> >> Would this make sense?  It could look like:
> >> >>
> >> >> int epoll_mod_and_pwait(int epfd,
> >> >>   struct epoll_event *events, int maxevents,
> >> >>   struct epoll_command *commands, int ncommands,
> >> >>   const sigset_t *sigmask);
> >> >
> >> > What about flags?
> >> >
> >>
> >> No room.  Maybe it should just be a struct for everything instead of
> >> separate args.
> >
> > Also no room for timeout. A single struct sounds the only way to go.
> 
> That's what timerfd is for.  I think it would be a bit weird to
> support "timeout" and detailed timerfd control.

I see what you mean. Thanks.

I still don't like hooking timerfd in the interface. Besides the unclean
interface, it also feels cubersome and overkill to let users setup and add a
dedicated timerfd to implement timeout.

How about this:

int epoll_mod_wait(int epfd, struct epoll_mod_wait_data *data);

struct epoll_mod_wait_data {
	struct epoll_event *events;
	int maxevents;
	struct epoll_mod_cmd {
		int op,
		int fd;
		void *data;
	} *cmds;
	int ncmds;
	int flags;
	sigset_t *sigmask;
};

Commands ops are:

	EPOLL_CTL_ADD
		@fd is the fd to modify; @data is epoll_event.
	EPOLL_CTL_MOD
		@fd is the fd to modify; @data is epoll_event.
	EPOLL_CTL_DEL
		@fd is the fd to modify; @data is epoll_event.

	EPOLL_CTL_SET_TIMEOUT
		@fd is ignored, @data is timespec.
		Clock type and relative/absolute are selected by flags as below.

Flags are given to override timeout defaults:
	EPOLL_FL_MONOTONIC_CLOCK
		If set, don't use realtime clock, use monotonic clock.
	EPOLL_FL_ABSOLUTE_TIMEOUT
		If set, don't use relative timeout, use absolute timeout.

Thanks,
Fam

^ permalink raw reply

* Re: [PATCH v8 1/2] crypto: AF_ALG: add AEAD support
From: Stephan Mueller @ 2015-01-09  3:30 UTC (permalink / raw)
  To: Herbert Xu
  Cc: 'Quentin Gouchet', Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150108110931.GA8568-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

Am Donnerstag, 8. Januar 2015, 22:09:31 schrieb Herbert Xu:

Hi Herbert,

> On Wed, Jan 07, 2015 at 04:51:38PM +0100, Stephan Mueller wrote:
> > +		if (!aead_writable(sk)) {
> > +			/*
> > +			 * If there is more data to be expected, but we cannot
> > +			 * write more data, forcefully define that we do not
> > +			 * expect more data to invoke the AEAD operation. This
> > +			 * prevents a deadlock in user space.
> > +			 */
> > +			ctx->more = 0;
> 
> We should return EMSGSIZE here.  Also we should clear out the
> existing data so that the socket may be reused again.

Is this really wise considering that we want to support a threaded caller? For 
example, one thread sends data and another reads data. For some reason, the 
reading thread is throttled or slower than the sender. Now, with the current 
solution, the sender is put on hold (i.e. throttled) until the reader can 
catch up. I.e. we have an automated synchronization between sender/receiver.

Thus, when we remove the wait here and return an error, the sender will be 
shut down and there is no synchronization of the reader/writer any more.

Note, the same applies to the very similar code in aead_sendpage too.
> 
> > +	ctx->more = msg->msg_flags & MSG_MORE;
> > +	if (!ctx->more && !aead_sufficient_data(ctx))
> > +		err = -EINVAL;
> 
> Ditto, we should discard the data that's queued up.  Also perhaps
> use EBADMSG instead of EINVAL.

Agreed that we should clear out the buffer. I will provide that in the next 
release for both, the sendmsg and sendpage implementations.

However, I am not sure whether using EBADMSG is a good idea. The error of 
EBADMSG in the kernel crypto API is only used for integrity errors of AEAD 
ciphers. But our error case here has nothing to do with the integrity error.

I would be fine with any other error number -- EMSGSIZE as you suggested 
above?
> 
> > +	/*
> > +	 * Require exactly one IOV block as the AEAD operation is a one shot
> > +	 * due to the authentication tag.
> > +	 */
> > +	if (msg->msg_iter.nr_segs != 1)
> > +		return -ENOMSG;
> 
> Why does the receive buffer have to be contiguous?

I thought for quite some time about how we can use multiple iovecs. But I 
found no satisfactory solution. The solution I see is described below.

If we consider the skcipher implementation, the code iterates over the iovecs 
as the outermost loop. In each loop iteration the cipher operation is 
triggered.

Now in case of AEAD, all provided data the kernel received has to be operated 
on with exactly one cipher invocation. Looking at skcipher, that would mean 
that we only perform one loop iteration, i.e. processing exactly one iovec.

A possible solution would be that I use an array of struct af_alg_sgl and 
iterate over that array for each iovec. Something like the following:

struct aead_ctx {
...
	struct af_alg_sgl rsgl[ALG_MAX_PAGES];

for(i = 0; i < ALG_MAX_PAGES; i++) {
	af_alg_make_sg(rsgl[i], iov[i])
	scatterwalk_sg_chain(rsgl[i-1].sg rsgl[i]);
}

But my concern is that with the array of rsgl we occupy a sizable amount of 
memory as af_alg_sgl again defines arrays of entries.

Do you think whether such approach makes sense? If yes, which limit to the 
number of rsgl should we apply -- is ALG_MAX_PAGES good?

-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
From: Andy Lutomirski @ 2015-01-09  2:24 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Miklos Szeredi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers
In-Reply-To: <20150109015248.GA5034-+wGkCoP0yD+sDdueE5tM26fLeoKvNuZc@public.gmane.org>

On Thu, Jan 8, 2015 at 5:52 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 01/08 17:28, Andy Lutomirski wrote:
>> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
>> >> I'd like to see a more ambitious change, since the timer isn't the
>> >> only problem like this.  Specifically, I'd like a syscall that does a
>> >> list of epoll-related things and then waits.  The list of things could
>> >> include, at least:
>> >>
>> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> >> want to turn on and off their requests for events on a somewhat
>> >> regular basis.
>> >
>> > This sounds good to me.
>> >
>> >>
>> >>  - timerfd_settime actions: this allows a single syscall to wait and
>> >> adjust *both* monotonic and real-time wakeups.
>> >
>> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
>>
>> Yes.  It's not very elegant, and more elegant ideas are welcome.
>
> What is the purpose of embedding timerfd operation here? Modifying timerfd
> for each poll doesn't sound a common pattern to me.

Setting a timeout is definitely a common pattern, hence this thread.
But the current timeout interface sucks, and people should really use
absolute time.  (My epoll software uses absolute time.)  But then
users need to decide whether to have their timeout based on the
monotonic clock or the realtime clock (or something else entirely).
Some bigger programs may want both -- they may have internal events
queued for certain times and for certain timeouts, and those should
use realtime and monotonic respectively.  Heck, users may also want
separate slack values on those.

Timerfd is the only thing we have right now that is anywhere near
flexible enough.  Obviously if epoll became fancy enough, then we
could do away with the timerfd entirely here.

>
>>
>> >
>> >>
>> >> Would this make sense?  It could look like:
>> >>
>> >> int epoll_mod_and_pwait(int epfd,
>> >>   struct epoll_event *events, int maxevents,
>> >>   struct epoll_command *commands, int ncommands,
>> >>   const sigset_t *sigmask);
>> >
>> > What about flags?
>> >
>>
>> No room.  Maybe it should just be a struct for everything instead of
>> separate args.
>
> Also no room for timeout. A single struct sounds the only way to go.

That's what timerfd is for.  I think it would be a bit weird to
support "timeout" and detailed timerfd control.

--Andy

^ permalink raw reply

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
From: Fam Zheng @ 2015-01-09  1:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-kernel@vger.kernel.org, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Alexander Viro,
	Andrew Morton, Juri Lelli, Zach Brown, David Drysdale, Kees Cook,
	Alexei Starovoitov, David Herrmann, Dario Faggioli,
	Theodore Ts'o, Peter Zijlstra, Vivek Goyal, Mike Frysinger,
	Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers
In-Reply-To: <CALCETrWHGwPw2cH3RruaCFKj2ND8PdEbdWjTi4OJ8m2eF9fpLw@mail.gmail.com>

On Thu, 01/08 17:28, Andy Lutomirski wrote:
> On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Thu, 01/08 09:57, Andy Lutomirski wrote:
> >> I'd like to see a more ambitious change, since the timer isn't the
> >> only problem like this.  Specifically, I'd like a syscall that does a
> >> list of epoll-related things and then waits.  The list of things could
> >> include, at least:
> >>
> >>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
> >> want to turn on and off their requests for events on a somewhat
> >> regular basis.
> >
> > This sounds good to me.
> >
> >>
> >>  - timerfd_settime actions: this allows a single syscall to wait and
> >> adjust *both* monotonic and real-time wakeups.
> >
> > I'm not sure, doesn't this break orthogonality between epoll and timerfd?
> 
> Yes.  It's not very elegant, and more elegant ideas are welcome.

What is the purpose of embedding timerfd operation here? Modifying timerfd
for each poll doesn't sound a common pattern to me.

> 
> >
> >>
> >> Would this make sense?  It could look like:
> >>
> >> int epoll_mod_and_pwait(int epfd,
> >>   struct epoll_event *events, int maxevents,
> >>   struct epoll_command *commands, int ncommands,
> >>   const sigset_t *sigmask);
> >
> > What about flags?
> >
> 
> No room.  Maybe it should just be a struct for everything instead of
> separate args.

Also no room for timeout. A single struct sounds the only way to go.

Fam

^ permalink raw reply

* Re: [PATCH 0/3] epoll: Add epoll_pwait1 syscall
From: Andy Lutomirski @ 2015-01-09  1:28 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Miklos Szeredi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Alexander Viro, Andrew Morton, Juri Lelli, Zach Brown,
	David Drysdale, Kees Cook, Alexei Starovoitov, David Herrmann,
	Dario Faggioli, Theodore Ts'o, Peter Zijlstra, Vivek Goyal,
	Mike Frysinger, Heiko Carstens, Rasmus Villemoes, Oleg Nesterov,
	Mathieu Desnoyers
In-Reply-To: <20150109011608.GA2924-+wGkCoP0yD+sDdueE5tM26fLeoKvNuZc@public.gmane.org>

On Thu, Jan 8, 2015 at 5:25 PM, Fam Zheng <famz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 01/08 09:57, Andy Lutomirski wrote:
>> On Thu, Jan 8, 2015 at 1:12 AM, Miklos Szeredi <mszeredi-AlSwsSmVLrQ@public.gmane.org> wrote:
>> > On Thu, 2015-01-08 at 16:25 +0800, Fam Zheng wrote:
>> >> Applications could use epoll interface when then need to poll a big number of
>> >> files in their main loops, to achieve better performance than ppoll(2). Except
>> >> for one concern: epoll only takes timeout parameters in microseconds, rather
>> >> than nanoseconds.
>> >>
>> >> That is a drawback we should address. For a real case in QEMU, we run into a
>> >> scalability issue with ppoll(2) when many devices are attached to guest, in
>> >> which case many host fds, such as virtual disk images and sockets, need to be
>> >> polled by the main loop. As a result we are looking at switching to epoll, but
>> >> the coarse timeout precision is a trouble, as explained below.
>> >>
>> >> We're already using prctl(PR_SET_TIMERSLACK, 1) which is necessary to implement
>> >> timers in the main loop; and we call ppoll(2) with the next firing timer as
>> >> timeout, so when ppoll(2) returns, we know that we have more work to do (either
>> >> handling IO events, or fire a timer callback). This is natual and efficient,
>> >> except that ppoll(2) itself is slow.
>> >>
>> >> Now that we want to switch to epoll, to speed up the polling. However the timer
>> >> slack setting will be effectively undone, because that way we will have to
>> >> round up the timeout to microseconds honoring timer contract. But consequently,
>> >> this hurts the general responsiveness.
>> >>
>> >> Note: there are two alternatives, without changing kernel:
>> >>
>> >> 1) Leading ppoll(2), with the epollfd only and a nanosecond timeout. It won't
>> >> be slow as one fd is polled. No more scalability issue. And if there are
>> >> events, we know from ppoll(2)'s return, then we do the epoll_wait(2) with
>> >> timeout=0; otherwise, there can't be events for the epoll, skip the following
>> >> epoll_wait and just continue with other work.
>> >>
>> >> 2) Setup and add a timerfd to epoll, then we do epoll_wait(..., timeout=-1).
>> >> The timerfd will hopefully force epoll_wait to return when it timeouts, even if
>> >> no other events have arrived. This will inheritly give us timerfd's precision.
>> >> Note that for each poll, the desired timeout is different because the next
>> >> timer is different, so that, before each epoll_wait(2), there will be a
>> >> timerfd_settime syscall to set it to a proper value.
>> >>
>> >> Unfortunately, both approaches require one more syscall per iteration, compared
>> >> to the original single ppoll(2), cost of which is unneglectable when we talk
>> >> about nanosecond granularity.
>>
>> I'd like to see a more ambitious change, since the timer isn't the
>> only problem like this.  Specifically, I'd like a syscall that does a
>> list of epoll-related things and then waits.  The list of things could
>> include, at least:
>>
>>  - EPOLL_CTL_MOD actions: level-triggered epoll users are likely to
>> want to turn on and off their requests for events on a somewhat
>> regular basis.
>
> This sounds good to me.
>
>>
>>  - timerfd_settime actions: this allows a single syscall to wait and
>> adjust *both* monotonic and real-time wakeups.
>
> I'm not sure, doesn't this break orthogonality between epoll and timerfd?

Yes.  It's not very elegant, and more elegant ideas are welcome.

>
>>
>> Would this make sense?  It could look like:
>>
>> int epoll_mod_and_pwait(int epfd,
>>   struct epoll_event *events, int maxevents,
>>   struct epoll_command *commands, int ncommands,
>>   const sigset_t *sigmask);
>
> What about flags?
>

No room.  Maybe it should just be a struct for everything instead of
separate args.

> Fam



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox