Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Oleg Nesterov @ 2019-07-25 16:13 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team,
	Ingo Molnar, Peter Zijlstra, linux-api
In-Reply-To: <20190725122650.4i3arct5rpchqmyt@brauner.io>

On 07/25, Christian Brauner wrote:
>
> The key is that you want to be able to create child processes in a
> shared library without the main programing having to know about this so
> that it can use P_ALL and never get stuff from the library.

OK, thanks...

in this case you should probablu pass 0 in CSIGNAL to ensure that the main
program won't be notified when that child exits.

Oleg.

^ permalink raw reply

* Re: [PATCH 2/5] pidfd: add pidfd_wait()
From: Eric W. Biederman @ 2019-07-25 14:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Brauner, Linux List Kernel Mailing, Oleg Nesterov,
	Arnd Bergmann, Kees Cook, Joel Fernandes, Thomas Gleixner,
	Tejun Heo, David Howells, Jann Horn, Andrew Lutomirski,
	Andrew Morton, Aleksa Sarai, Al Viro, Android Kernel Team,
	Linux API
In-Reply-To: <CAHk-=whZPKzbPQftNGFB=iaSZGTSXNkhUASWF2V53MwB+A4zAQ@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, Jul 24, 2019 at 7:47 AM Christian Brauner <christian@brauner.io> wrote:
>>
>> This adds the pidfd_wait() syscall.
>
> I despise this patch.
>
> Why can't this just be a new P_PIDFD flag, and then use
> "waitid(P_PIDFD, pidfd, ...);"
>
> Yes, yes, yes, I realize that "pidfd" is of type "int", and waitid()
> takes an argument of type pid_t, but it's the same type in the end,
> and it does seem like the whole *point* of "waitid()" is that
> "idtype_t idtype" which tells you what kind of ID you're passing it.

Just to emphasize this point.

The posix declaration of waitid is:
>int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options);

Where id_t is defined as:
> id_t - Used as a general identifier; can be used to contain at least a pid_t, uid_t, or gid_t.

And the BSDs at least have defined P_UID and P_GID.  So that flexibility
has been used.

So it looks entirely reasonable to have P_PIDFD that just waits for a
specified child.

Eric

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Christian Brauner @ 2019-07-25 12:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team,
	Ingo Molnar, Peter Zijlstra, linux-api
In-Reply-To: <20190725114359.GH4707@redhat.com>

On Thu, Jul 25, 2019 at 01:43:59PM +0200, Oleg Nesterov wrote:
> Or. We can change wait_consider_task() to not clear ->notask_error if
> WXXX and the child is PF_WAIT_PID.
> 
> This way you can "safely" use wait() without WNOHANG, it won't block if
> all the children which can report an even are PF_WAIT_PID.
> 
> But I do not understand your use-cases, I have no idea if this can help

One usecase (among others listed in the commit message) are shared
libraries. P_ALL is usually something you can't really use in a shared
library because you have no idea what else might be fork()ed off. Only
the main program can use this but none of the auxiliary libraries that
it uses.
The other way around you want to be able fork() off something without
affecting P_ALL in the main program.
The key is that you want to be able to create child processes in a
shared library without the main programing having to know about this so
that it can use P_ALL and never get stuff from the library.

Assume you have a project with a main loop with a million things
happening in that mainloop like some gui app running an avi video. For
example, gtk uses gstreamer which forks off all codecs in child
processes which are sandboxed for security. So gstreamer is using helper
processes in the background which are my children now. Now I'm creating
four more additional helper processes as well. Now, in my (glib, qt
whatever) mainloop on SIGCHLD some part of the app is checking with
WNHOANG and finds a process has exited. It's cleaning this thing up now
but it's not a process it wanted to clean up. The other part of the app
is now doing waitid(P_PID, pid) but will find the process already gone
it wanted to reap.

I hope I'm expressing this well enough.

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Oleg Nesterov @ 2019-07-25 11:43 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team,
	Ingo Molnar, Peter Zijlstra, linux-api
In-Reply-To: <20190725112503.GG4707@redhat.com>

Or. We can change wait_consider_task() to not clear ->notask_error if
WXXX and the child is PF_WAIT_PID.

This way you can "safely" use wait() without WNOHANG, it won't block if
all the children which can report an even are PF_WAIT_PID.

But I do not understand your use-cases, I have no idea if this can help
or not. Just I think the more discussion is always better when we are
going to add the new API.


On 07/25, Oleg Nesterov wrote:
>
> On 07/25, Christian Brauner wrote:
> >
> > On Thu, Jul 25, 2019 at 12:35:44PM +0200, Oleg Nesterov wrote:
> > >
> > > I have to admit this feature looks a bit exotic to me...
> >
> > It might look like it from the kernels perspective but from the feedback
> > on this when presenting on this userspace has real usecases for this.
> 
> OK...
> 
> but then perhaps we can make PF_WAIT_PID more flexible.
> 
> Say, we can add the new WXXX wait option and change eligible_child()
> 
> 	if ((p->flags & PF_WAIT_PID) && (wo->options & WXXX))
> 		return 0;
> 
> this way the parent can tell waitid() whether the PF_WAIT_PID tasks should
> be filtered or not.
> 
> And if we do this we can even add PR_SET_WAIT_PID/PR_CLR_WAIT_PID instead
> of the new CLONE_ flag.
> 
> Oleg.

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Christian Brauner @ 2019-07-25 11:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team,
	Ingo Molnar, Peter Zijlstra, linux-api
In-Reply-To: <20190725112503.GG4707@redhat.com>

On Thu, Jul 25, 2019 at 01:25:03PM +0200, Oleg Nesterov wrote:
> On 07/25, Christian Brauner wrote:
> >
> > On Thu, Jul 25, 2019 at 12:35:44PM +0200, Oleg Nesterov wrote:
> > >
> > > I have to admit this feature looks a bit exotic to me...
> >
> > It might look like it from the kernels perspective but from the feedback
> > on this when presenting on this userspace has real usecases for this.
> 
> OK...
> 
> but then perhaps we can make PF_WAIT_PID more flexible.

I've changed this to be a property on signal_struct, i.e. a bitfield
entry following the {is,has}_child_subreaper entries.

> 
> Say, we can add the new WXXX wait option and change eligible_child()
> 
> 	if ((p->flags & PF_WAIT_PID) && (wo->options & WXXX))
> 		return 0;
> 
> this way the parent can tell waitid() whether the PF_WAIT_PID tasks should
> be filtered or not.
> 
> And if we do this we can even add PR_SET_WAIT_PID/PR_CLR_WAIT_PID instead
> of the new CLONE_ flag.

Hm, I prefer to not do this.
The idea is that this is a property of the pidfd itself and not of the
waitid() call. And currently, I don't think it makes sense to change
this property at runtime.
What might make sense is to remove this property when a task gets
reparented, i.e. when its parent has died.

^ permalink raw reply

* Re: [PATCH v12 3/6] sched/core: uclamp: Propagate system defaults to root group
From: Michal Koutný @ 2019-07-25 11:41 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: cgroups, linux-api, linux-kernel, linux-pm, Alessio Balsini,
	Dietmar Eggemann, Morten Rasmussen, Quentin Perret,
	Joel Fernandes, Paul Turner, Steve Muckle, Suren Baghdasaryan,
	Todd Kjos, Peter Zijlstra, Rafael J . Wysocki, Tejun Heo,
	Vincent Guittot, Viresh Kumar, Juri Lelli, Ingo Molnar
In-Reply-To: <20190718181748.28446-4-patrick.bellasi@arm.com>

[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]

On Thu, Jul 18, 2019 at 07:17:45PM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> The clamp values are not tunable at the level of the root task group.
> That's for two main reasons:
> 
>  - the root group represents "system resources" which are always
>    entirely available from the cgroup standpoint.
> 
>  - when tuning/restricting "system resources" makes sense, tuning must
>    be done using a system wide API which should also be available when
>    control groups are not.
> 
> When a system wide restriction is available, cgroups should be aware of
> its value in order to know exactly how much "system resources" are
> available for the subgroups.
IIUC, the global default would apply in uclamp_eff_get(), so this
propagation isn't strictly necessary in order to apply to tasks (that's
how it works under !CONFIG_UCLAMP_TASK_GROUP).
The reason is that effective value (which isn't exposed currently) in a
group takes into account this global restriction, right?


> @@ -1043,12 +1063,17 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> [...]
> +	if (update_root_tg)
> +		uclamp_update_root_tg();
> +
>  	/*
>  	 * Updating all the RUNNABLE task is expensive, keep it simple and do
>  	 * just a lazy update at each next enqueue time.
Since uclamp_update_root_tg() traverses down to
uclamp_update_active_tasks() is this comment half true now?


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v12 1/6] sched/core: uclamp: Extend CPU's cgroup controller
From: Michal Koutný @ 2019-07-25 11:41 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: cgroups, linux-api, linux-kernel, linux-pm, Alessio Balsini,
	Dietmar Eggemann, Morten Rasmussen, Quentin Perret,
	Joel Fernandes, Paul Turner, Steve Muckle, Suren Baghdasaryan,
	Todd Kjos, Peter Zijlstra, Rafael J . Wysocki, Tejun Heo,
	Vincent Guittot, Viresh Kumar, Juri Lelli, Ingo Molnar
In-Reply-To: <20190718181748.28446-2-patrick.bellasi@arm.com>

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

On Thu, Jul 18, 2019 at 07:17:43PM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> +static ssize_t cpu_uclamp_min_write(struct kernfs_open_file *of,
> +				    char *buf, size_t nbytes,
> +				    loff_t off)
> +{
> [...]
> +static ssize_t cpu_uclamp_max_write(struct kernfs_open_file *of,
> +				    char *buf, size_t nbytes,
> +				    loff_t off)
> +{
> [...]
These two functions are almost identical yet not trivial. I think it
wouldn be better to have the code at one place only and distinguish by
the passed clamp_id.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Oleg Nesterov @ 2019-07-25 11:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team,
	Ingo Molnar, Peter Zijlstra, linux-api
In-Reply-To: <20190725104006.7myahvjtnbcgu3in@brauner.io>

On 07/25, Christian Brauner wrote:
>
> On Thu, Jul 25, 2019 at 12:35:44PM +0200, Oleg Nesterov wrote:
> >
> > I have to admit this feature looks a bit exotic to me...
>
> It might look like it from the kernels perspective but from the feedback
> on this when presenting on this userspace has real usecases for this.

OK...

but then perhaps we can make PF_WAIT_PID more flexible.

Say, we can add the new WXXX wait option and change eligible_child()

	if ((p->flags & PF_WAIT_PID) && (wo->options & WXXX))
		return 0;

this way the parent can tell waitid() whether the PF_WAIT_PID tasks should
be filtered or not.

And if we do this we can even add PR_SET_WAIT_PID/PR_CLR_WAIT_PID instead
of the new CLONE_ flag.

Oleg.

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Christian Brauner @ 2019-07-25 10:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team,
	Ingo Molnar, Peter Zijlstra, linux-api
In-Reply-To: <20190725103543.GF4707@redhat.com>

On Thu, Jul 25, 2019 at 12:35:44PM +0200, Oleg Nesterov wrote:
> On 07/24, Christian Brauner wrote:
> >
> > If CLONE_WAIT_PID is set the newly created process will not be
> > considered by process wait requests that wait generically on children
> > such as:
> 
> I have to admit this feature looks a bit exotic to me...

It might look like it from the kernels perspective but from the feedback
on this when presenting on this userspace has real usecases for this.

> 
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -1019,6 +1019,9 @@ eligible_child(struct wait_opts *wo, bool ptrace, struct task_struct *p)
> >  	if (!eligible_pid(wo, p))
> >  		return 0;
> >
> > +	if ((p->flags & PF_WAIT_PID) && (wo->wo_type != PIDTYPE_PID))
> > +		return 0;
> 
> Even if ptrace == T ?
> 
> This doesn't look right. Say, strace should work even if its tracee (or
> one of the tracees) has PF_WAIT_PID.

As in
	if (!ptrace && (p->flags & PF_WAIT_PID) && (wo->wo_type != PIDTYPE_PID))
		return 0;

Sure, we can allow that.

Christian

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Christian Brauner @ 2019-07-25 10:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jann Horn, kernel list, Arnd Bergmann, Eric W. Biederman,
	Kees Cook, Joel Fernandes (Google), Thomas Gleixner, Tejun Heo,
	David Howells, Andy Lutomirski, Andrew Morton, Aleksa Sarai,
	Linus Torvalds, Al Viro, kernel-team, Ingo Molnar, Peter Zijlstra,
	Linux API
In-Reply-To: <20190725103048.GE4707@redhat.com>

On Thu, Jul 25, 2019 at 12:30:48PM +0200, Oleg Nesterov wrote:
> On 07/24, Jann Horn wrote:
> >
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1902,6 +1902,10 @@ static __latent_entropy struct task_struct *copy_process(
> >         delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
> >         p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
> >         p->flags |= PF_FORKNOEXEC;
> > +       if (!(clone_flags & CLONE_THREAD))
> > +               p->flags &= ~PF_PF_WAIT_PID;
> > +       if (clone_flags & CLONE_WAIT_PID)
> > +               p->flags |= PF_PF_WAIT_PID;
> 
> agreed, but then the "if (!thread_group_leader(tsk))" block in de_thread()
> should also copy PF_PF_WAIT_PID.
> 
> > An alternative would be to not use p->flags at all, but instead make
> > this a property of the signal_struct - since the property is shared by
> > all threads, that might make more sense?
> 
> I tend to agree.

Hm, ok. That's two people that prefer to make this a flag in
signal_struct. Ok, let me adapt the patch.

Thanks!
Christian

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Oleg Nesterov @ 2019-07-25 10:35 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team,
	Ingo Molnar, Peter Zijlstra, linux-api
In-Reply-To: <20190724144651.28272-5-christian@brauner.io>

On 07/24, Christian Brauner wrote:
>
> If CLONE_WAIT_PID is set the newly created process will not be
> considered by process wait requests that wait generically on children
> such as:

I have to admit this feature looks a bit exotic to me...

> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1019,6 +1019,9 @@ eligible_child(struct wait_opts *wo, bool ptrace, struct task_struct *p)
>  	if (!eligible_pid(wo, p))
>  		return 0;
>
> +	if ((p->flags & PF_WAIT_PID) && (wo->wo_type != PIDTYPE_PID))
> +		return 0;

Even if ptrace == T ?

This doesn't look right. Say, strace should work even if its tracee (or
one of the tracees) has PF_WAIT_PID.

Oleg.

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Oleg Nesterov @ 2019-07-25 10:30 UTC (permalink / raw)
  To: Jann Horn
  Cc: Christian Brauner, kernel list, Arnd Bergmann, Eric W. Biederman,
	Kees Cook, Joel Fernandes (Google), Thomas Gleixner, Tejun Heo,
	David Howells, Andy Lutomirski, Andrew Morton, Aleksa Sarai,
	Linus Torvalds, Al Viro, kernel-team, Ingo Molnar, Peter Zijlstra,
	Linux API
In-Reply-To: <CAG48ez1vd4Yhd3DqHVjTWM-N0MaNnX9n8MNV7MEyU5m3XDu+kQ@mail.gmail.com>

On 07/24, Jann Horn wrote:
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1902,6 +1902,10 @@ static __latent_entropy struct task_struct *copy_process(
>         delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
>         p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
>         p->flags |= PF_FORKNOEXEC;
> +       if (!(clone_flags & CLONE_THREAD))
> +               p->flags &= ~PF_PF_WAIT_PID;
> +       if (clone_flags & CLONE_WAIT_PID)
> +               p->flags |= PF_PF_WAIT_PID;

agreed, but then the "if (!thread_group_leader(tsk))" block in de_thread()
should also copy PF_PF_WAIT_PID.

> An alternative would be to not use p->flags at all, but instead make
> this a property of the signal_struct - since the property is shared by
> all threads, that might make more sense?

I tend to agree.

Oleg.

^ permalink raw reply

* Re: [PATCH 2/5] pidfd: add pidfd_wait()
From: Christian Brauner @ 2019-07-25 10:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team, linux-api
In-Reply-To: <20190725101626.GD4707@redhat.com>

On Thu, Jul 25, 2019 at 12:16:27PM +0200, Oleg Nesterov wrote:
> On 07/24, Christian Brauner wrote:
> >
> > +SYSCALL_DEFINE6(pidfd_wait, int, pidfd, int __user *, stat_addr,
> > +		siginfo_t __user *, info, struct rusage __user *, ru,
> > +		unsigned int, states, unsigned int, flags)
> > +{
> 
> Oh, I too think that P_PIDFD makes more sense.

I have already updated the patch to introduce P_PIDFD.

> 
> and could you explain in the changelog why? I am not arguing and if
> nothing else this is consistent with other pidfd features, but if you
> are parent/debugger you can't hit the problem with pid-reuse, unless
> you races with your sub-threads.

One of the things is that later on this will allow us to make it
possible to retrieve the exit status via waitid(P_PIDFD) for non-parent
processes if handed a _suitable_ pidfd that has this feature set. Maybe
even - if safe - make it possible to wait on a process as a non-parent.
And some tools just really want to do away with pids completely.

Christian

^ permalink raw reply

* Re: [PATCH 2/5] pidfd: add pidfd_wait()
From: Oleg Nesterov @ 2019-07-25 10:16 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, arnd, ebiederm, keescook, joel, tglx, tj, dhowells,
	jannh, luto, akpm, cyphar, torvalds, viro, kernel-team, linux-api
In-Reply-To: <20190724144651.28272-3-christian@brauner.io>

On 07/24, Christian Brauner wrote:
>
> +SYSCALL_DEFINE6(pidfd_wait, int, pidfd, int __user *, stat_addr,
> +		siginfo_t __user *, info, struct rusage __user *, ru,
> +		unsigned int, states, unsigned int, flags)
> +{

Oh, I too think that P_PIDFD makes more sense.

and could you explain in the changelog why? I am not arguing and if
nothing else this is consistent with other pidfd features, but if you
are parent/debugger you can't hit the problem with pid-reuse, unless
you races with your sub-threads.

Oleg.

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Christian Brauner @ 2019-07-25 10:11 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Oleg Nesterov, Arnd Bergmann, Eric W. Biederman,
	Kees Cook, Joel Fernandes (Google), Thomas Gleixner, Tejun Heo,
	David Howells, Andy Lutomirski, Andrew Morton, Aleksa Sarai,
	Linus Torvalds, Al Viro, kernel-team, Ingo Molnar, Peter Zijlstra,
	Linux API
In-Reply-To: <DB572B9F-4D21-402F-A68B-CD193BBB166C@brauner.io>

On Wed, Jul 24, 2019 at 09:10:20PM +0200, Christian Brauner wrote:
> On July 24, 2019 9:07:54 PM GMT+02:00, Jann Horn <jannh@google.com> wrote:
> >On Wed, Jul 24, 2019 at 8:27 PM Christian Brauner
> ><christian@brauner.io> wrote:
> >> On July 24, 2019 8:14:26 PM GMT+02:00, Jann Horn <jannh@google.com>
> >wrote:
> >> >On Wed, Jul 24, 2019 at 4:48 PM Christian Brauner
> >> ><christian@brauner.io> wrote:
> >> >> If CLONE_WAIT_PID is set the newly created process will not be
> >> >> considered by process wait requests that wait generically on
> >children
> >> >> such as:
> >> >>
> >> >>         syscall(__NR_wait4, -1, wstatus, options, rusage)
> >> >>         syscall(__NR_waitpid, -1, wstatus, options)
> >> >>         syscall(__NR_waitid, P_ALL, -1, siginfo, options, rusage)
> >> >>         syscall(__NR_waitid, P_PGID, -1, siginfo, options, rusage)
> >> >>         syscall(__NR_waitpid, -pid, wstatus, options)
> >> >>         syscall(__NR_wait4, -pid, wstatus, options, rusage)
> >> >>
> >> >> A process created with CLONE_WAIT_PID can only be waited upon with
> >a
> >> >> focussed wait call. This ensures that processes can be reaped even
> >if
> >> >> all file descriptors referring to it are closed.
> >> >[...]
> >> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> >> index baaff6570517..a067f3876e2e 100644
> >> >> --- a/kernel/fork.c
> >> >> +++ b/kernel/fork.c
> >> >> @@ -1910,6 +1910,8 @@ static __latent_entropy struct task_struct
> >> >*copy_process(
> >> >>         delayacct_tsk_init(p);  /* Must remain after
> >> >dup_task_struct() */
> >> >>         p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
> >> >>         p->flags |= PF_FORKNOEXEC;
> >> >> +       if (clone_flags & CLONE_WAIT_PID)
> >> >> +               p->flags |= PF_WAIT_PID;
> >> >>         INIT_LIST_HEAD(&p->children);
> >> >>         INIT_LIST_HEAD(&p->sibling);
> >> >>         rcu_copy_process(p);
> >> >
> >> >This means that if a process with PF_WAIT_PID forks, the child
> >> >inherits the flag, right? That seems unintended? You might have to
> >add
> >> >something like "if (clone_flags & CLONE_THREAD == 0) p->flags &=
> >> >~PF_WAIT_PID;" before this. (I think threads do have to inherit the
> >> >flag so that the case where a non-leader thread of the child goes
> >> >through execve and steals the leader's identity is handled
> >properly.)
> >> >Or you could cram it somewhere into signal_struct instead of on the
> >> >task - that might be a more logical place for it?
> >>
> >> Hm, CLONE_WAIT_PID is only useable with CLONE_PIDFD which in turn is
> >> not useable with CLONE_THREAD.
> >> But we should probably make that explicit for CLONE_WAIT_PID too.
> >
> >To clarify:
> >
> >This code looks buggy to me because p->flags is inherited from the
> >parent, with the exception of flags that are explicitly stripped out.
> >Since PF_WAIT_PID is not stripped out, this means that if task A
> >creates a child B with clone(CLONE_WAIT_PID), and then task B uses
> >fork() to create a child C, then B will not be able to use
> >wait(&status) to wait for C since C inherited PF_WAIT_PID from B.
> >
> >The obvious way to fix that would be to always strip out PF_WAIT_PID;
> >but that would also be wrong, because if task B creates a thread C,
> >and then C calls execve(), the task_struct of B goes away and B's TGID
> >is taken over by C. When C eventually exits, it should still obey the
> >CLONE_WAIT_PID (since to A, it's all the same process). Therefore, if
> >p->flags is used to track whether the task was created with
> >CLONE_WAIT_PID, PF_WAIT_PID must be inherited if CLONE_THREAD is set.
> >So:
> >
> >diff --git a/kernel/fork.c b/kernel/fork.c
> >index d8ae0f1b4148..b32e1e9a6c9c 100644
> >--- a/kernel/fork.c
> >+++ b/kernel/fork.c
> >@@ -1902,6 +1902,10 @@ static __latent_entropy struct task_struct
> >*copy_process(
> >      delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
> >        p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
> >        p->flags |= PF_FORKNOEXEC;
> >+       if (!(clone_flags & CLONE_THREAD))
> >+               p->flags &= ~PF_PF_WAIT_PID;
> >+       if (clone_flags & CLONE_WAIT_PID)
> >+               p->flags |= PF_PF_WAIT_PID;
> >        INIT_LIST_HEAD(&p->children);
> >        INIT_LIST_HEAD(&p->sibling);
> >        rcu_copy_process(p);
> >
> >An alternative would be to not use p->flags at all, but instead make
> >this a property of the signal_struct - since the property is shared by
> >all threads, that might make more sense?
> 
> Yeah, thanks for clarifying.
> Now it's more obvious.
> I need to take a look at the signal struct before I can say anything about this.

I've been looking at this a bit late last night.
Putting this in the flags argument of signal_struct would indeed be
possible. But it feels misplaced to me there. I think the implied
semantics by having this part of task_struct are nicer, i.e. the intent
is clearer especially when the task is filtered later on in exit.c.
So unless anyone sees a clear problem or otherwise objects I would keep
it as a property of task_struct for now and fix it up.

Christian

^ permalink raw reply

* Re: [PATCH V36 27/29] tracefs: Restrict tracefs when the kernel is locked down
From: Steven Rostedt @ 2019-07-25  2:23 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: jmorris, linux-security-module, linux-kernel, linux-api,
	Matthew Garrett
In-Reply-To: <20190718194415.108476-28-matthewgarrett@google.com>

On Thu, 18 Jul 2019 12:44:13 -0700
Matthew Garrett <matthewgarrett@google.com> wrote:

> @@ -387,6 +412,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  				   struct dentry *parent, void *data,
>  				   const struct file_operations *fops)
>  {
> +	struct file_operations *proxy_fops;
>  	struct dentry *dentry;
>  	struct inode *inode;
>  
> @@ -402,8 +428,18 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  	if (unlikely(!inode))
>  		return failed_creating(dentry);
>  
> +	proxy_fops = kzalloc(sizeof(struct file_operations), GFP_KERNEL);
> +	if (!proxy_fops)

I believe we need "iput(inode);" here. Or move the allocation before
the inode allocation and free it on inode failure.

-- Steve

> +		return failed_creating(dentry);
> +
> +	if (!fops)
> +		fops = &tracefs_file_operations;
> +
> +	dentry->d_fsdata = (void *)fops;
> +	memcpy(proxy_fops, fops, sizeof(*proxy_fops));
> +	proxy_fops->open = default_open_file;
>  	inode->i_mode = mode;
> -	inode->i_fop = fops ? fops : &tracefs_file_operations;
> +	inode->i_fop = proxy_fops;
>  	inode->i_private = data;
>  	d_instantiate(dentry, inode);
>  	fsnotify_create(dentry->d_parent->d_inode, dentry);

^ permalink raw reply

* Re: [v4 PATCH 2/2] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
From: Andrew Morton @ 2019-07-25  0:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yang Shi, mhocko, mgorman, linux-mm, linux-kernel, linux-api
In-Reply-To: <6aeca7cf-d9da-95cc-e6dc-a10c2978c523@suse.cz>

On Wed, 24 Jul 2019 10:19:34 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 7/23/19 7:35 AM, Yang Shi wrote:
> > 
> > 
> > On 7/22/19 6:02 PM, Andrew Morton wrote:
> >> On Mon, 22 Jul 2019 09:25:09 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >>>> since there may be pages off LRU temporarily.  We should migrate other
> >>>> pages if MPOL_MF_MOVE* is specified.  Set has_unmovable flag if some
> >>>> paged could not be not moved, then return -EIO for mbind() eventually.
> >>>>
> >>>> With this change the above test would return -EIO as expected.
> >>>>
> >>>> Cc: Vlastimil Babka <vbabka@suse.cz>
> >>>> Cc: Michal Hocko <mhocko@suse.com>
> >>>> Cc: Mel Gorman <mgorman@techsingularity.net>
> >>>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> >>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >> Thanks.
> >>
> >> I'm a bit surprised that this doesn't have a cc:stable.  Did we
> >> consider that?
> > 
> > The VM_BUG just happens on 4.9, and it is enabled only by CONFIG_VM. For 
> > post-4.9 kernel, this fixes the semantics of mbind which should be not a 
> > regression IMHO.
> 
> 4.9 is a LTS kernel, so perhaps worth trying?
> 

OK, I'll add cc:stable to 

mm-mempolicy-make-the-behavior-consistent-when-mpol_mf_move-and-mpol_mf_strict-were-specified.patch

and

mm-mempolicy-handle-vma-with-unmovable-pages-mapped-correctly-in-mbind.patch

Do we have a Fixes: for these patches?

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Christian Brauner @ 2019-07-24 19:10 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Oleg Nesterov, Arnd Bergmann, Eric W. Biederman,
	Kees Cook, Joel Fernandes (Google), Thomas Gleixner, Tejun Heo,
	David Howells, Andy Lutomirski, Andrew Morton, Aleksa Sarai,
	Linus Torvalds, Al Viro, kernel-team, Ingo Molnar, Peter Zijlstra,
	Linux API
In-Reply-To: <CAG48ez1vd4Yhd3DqHVjTWM-N0MaNnX9n8MNV7MEyU5m3XDu+kQ@mail.gmail.com>

On July 24, 2019 9:07:54 PM GMT+02:00, Jann Horn <jannh@google.com> wrote:
>On Wed, Jul 24, 2019 at 8:27 PM Christian Brauner
><christian@brauner.io> wrote:
>> On July 24, 2019 8:14:26 PM GMT+02:00, Jann Horn <jannh@google.com>
>wrote:
>> >On Wed, Jul 24, 2019 at 4:48 PM Christian Brauner
>> ><christian@brauner.io> wrote:
>> >> If CLONE_WAIT_PID is set the newly created process will not be
>> >> considered by process wait requests that wait generically on
>children
>> >> such as:
>> >>
>> >>         syscall(__NR_wait4, -1, wstatus, options, rusage)
>> >>         syscall(__NR_waitpid, -1, wstatus, options)
>> >>         syscall(__NR_waitid, P_ALL, -1, siginfo, options, rusage)
>> >>         syscall(__NR_waitid, P_PGID, -1, siginfo, options, rusage)
>> >>         syscall(__NR_waitpid, -pid, wstatus, options)
>> >>         syscall(__NR_wait4, -pid, wstatus, options, rusage)
>> >>
>> >> A process created with CLONE_WAIT_PID can only be waited upon with
>a
>> >> focussed wait call. This ensures that processes can be reaped even
>if
>> >> all file descriptors referring to it are closed.
>> >[...]
>> >> diff --git a/kernel/fork.c b/kernel/fork.c
>> >> index baaff6570517..a067f3876e2e 100644
>> >> --- a/kernel/fork.c
>> >> +++ b/kernel/fork.c
>> >> @@ -1910,6 +1910,8 @@ static __latent_entropy struct task_struct
>> >*copy_process(
>> >>         delayacct_tsk_init(p);  /* Must remain after
>> >dup_task_struct() */
>> >>         p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
>> >>         p->flags |= PF_FORKNOEXEC;
>> >> +       if (clone_flags & CLONE_WAIT_PID)
>> >> +               p->flags |= PF_WAIT_PID;
>> >>         INIT_LIST_HEAD(&p->children);
>> >>         INIT_LIST_HEAD(&p->sibling);
>> >>         rcu_copy_process(p);
>> >
>> >This means that if a process with PF_WAIT_PID forks, the child
>> >inherits the flag, right? That seems unintended? You might have to
>add
>> >something like "if (clone_flags & CLONE_THREAD == 0) p->flags &=
>> >~PF_WAIT_PID;" before this. (I think threads do have to inherit the
>> >flag so that the case where a non-leader thread of the child goes
>> >through execve and steals the leader's identity is handled
>properly.)
>> >Or you could cram it somewhere into signal_struct instead of on the
>> >task - that might be a more logical place for it?
>>
>> Hm, CLONE_WAIT_PID is only useable with CLONE_PIDFD which in turn is
>> not useable with CLONE_THREAD.
>> But we should probably make that explicit for CLONE_WAIT_PID too.
>
>To clarify:
>
>This code looks buggy to me because p->flags is inherited from the
>parent, with the exception of flags that are explicitly stripped out.
>Since PF_WAIT_PID is not stripped out, this means that if task A
>creates a child B with clone(CLONE_WAIT_PID), and then task B uses
>fork() to create a child C, then B will not be able to use
>wait(&status) to wait for C since C inherited PF_WAIT_PID from B.
>
>The obvious way to fix that would be to always strip out PF_WAIT_PID;
>but that would also be wrong, because if task B creates a thread C,
>and then C calls execve(), the task_struct of B goes away and B's TGID
>is taken over by C. When C eventually exits, it should still obey the
>CLONE_WAIT_PID (since to A, it's all the same process). Therefore, if
>p->flags is used to track whether the task was created with
>CLONE_WAIT_PID, PF_WAIT_PID must be inherited if CLONE_THREAD is set.
>So:
>
>diff --git a/kernel/fork.c b/kernel/fork.c
>index d8ae0f1b4148..b32e1e9a6c9c 100644
>--- a/kernel/fork.c
>+++ b/kernel/fork.c
>@@ -1902,6 +1902,10 @@ static __latent_entropy struct task_struct
>*copy_process(
>      delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
>        p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
>        p->flags |= PF_FORKNOEXEC;
>+       if (!(clone_flags & CLONE_THREAD))
>+               p->flags &= ~PF_PF_WAIT_PID;
>+       if (clone_flags & CLONE_WAIT_PID)
>+               p->flags |= PF_PF_WAIT_PID;
>        INIT_LIST_HEAD(&p->children);
>        INIT_LIST_HEAD(&p->sibling);
>        rcu_copy_process(p);
>
>An alternative would be to not use p->flags at all, but instead make
>this a property of the signal_struct - since the property is shared by
>all threads, that might make more sense?

Yeah, thanks for clarifying.
Now it's more obvious.
I need to take a look at the signal struct before I can say anything about this.

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Jann Horn @ 2019-07-24 19:07 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kernel list, Oleg Nesterov, Arnd Bergmann, Eric W. Biederman,
	Kees Cook, Joel Fernandes (Google), Thomas Gleixner, Tejun Heo,
	David Howells, Andy Lutomirski, Andrew Morton, Aleksa Sarai,
	Linus Torvalds, Al Viro, kernel-team, Ingo Molnar, Peter Zijlstra,
	Linux API
In-Reply-To: <CFB4D39F-24B9-4AD9-B19C-E2D14D38A808@brauner.io>

On Wed, Jul 24, 2019 at 8:27 PM Christian Brauner <christian@brauner.io> wrote:
> On July 24, 2019 8:14:26 PM GMT+02:00, Jann Horn <jannh@google.com> wrote:
> >On Wed, Jul 24, 2019 at 4:48 PM Christian Brauner
> ><christian@brauner.io> wrote:
> >> If CLONE_WAIT_PID is set the newly created process will not be
> >> considered by process wait requests that wait generically on children
> >> such as:
> >>
> >>         syscall(__NR_wait4, -1, wstatus, options, rusage)
> >>         syscall(__NR_waitpid, -1, wstatus, options)
> >>         syscall(__NR_waitid, P_ALL, -1, siginfo, options, rusage)
> >>         syscall(__NR_waitid, P_PGID, -1, siginfo, options, rusage)
> >>         syscall(__NR_waitpid, -pid, wstatus, options)
> >>         syscall(__NR_wait4, -pid, wstatus, options, rusage)
> >>
> >> A process created with CLONE_WAIT_PID can only be waited upon with a
> >> focussed wait call. This ensures that processes can be reaped even if
> >> all file descriptors referring to it are closed.
> >[...]
> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> index baaff6570517..a067f3876e2e 100644
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -1910,6 +1910,8 @@ static __latent_entropy struct task_struct
> >*copy_process(
> >>         delayacct_tsk_init(p);  /* Must remain after
> >dup_task_struct() */
> >>         p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
> >>         p->flags |= PF_FORKNOEXEC;
> >> +       if (clone_flags & CLONE_WAIT_PID)
> >> +               p->flags |= PF_WAIT_PID;
> >>         INIT_LIST_HEAD(&p->children);
> >>         INIT_LIST_HEAD(&p->sibling);
> >>         rcu_copy_process(p);
> >
> >This means that if a process with PF_WAIT_PID forks, the child
> >inherits the flag, right? That seems unintended? You might have to add
> >something like "if (clone_flags & CLONE_THREAD == 0) p->flags &=
> >~PF_WAIT_PID;" before this. (I think threads do have to inherit the
> >flag so that the case where a non-leader thread of the child goes
> >through execve and steals the leader's identity is handled properly.)
> >Or you could cram it somewhere into signal_struct instead of on the
> >task - that might be a more logical place for it?
>
> Hm, CLONE_WAIT_PID is only useable with CLONE_PIDFD which in turn is
> not useable with CLONE_THREAD.
> But we should probably make that explicit for CLONE_WAIT_PID too.

To clarify:

This code looks buggy to me because p->flags is inherited from the
parent, with the exception of flags that are explicitly stripped out.
Since PF_WAIT_PID is not stripped out, this means that if task A
creates a child B with clone(CLONE_WAIT_PID), and then task B uses
fork() to create a child C, then B will not be able to use
wait(&status) to wait for C since C inherited PF_WAIT_PID from B.

The obvious way to fix that would be to always strip out PF_WAIT_PID;
but that would also be wrong, because if task B creates a thread C,
and then C calls execve(), the task_struct of B goes away and B's TGID
is taken over by C. When C eventually exits, it should still obey the
CLONE_WAIT_PID (since to A, it's all the same process). Therefore, if
p->flags is used to track whether the task was created with
CLONE_WAIT_PID, PF_WAIT_PID must be inherited if CLONE_THREAD is set.
So:

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..b32e1e9a6c9c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1902,6 +1902,10 @@ static __latent_entropy struct task_struct *copy_process(
        delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
        p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
        p->flags |= PF_FORKNOEXEC;
+       if (!(clone_flags & CLONE_THREAD))
+               p->flags &= ~PF_PF_WAIT_PID;
+       if (clone_flags & CLONE_WAIT_PID)
+               p->flags |= PF_PF_WAIT_PID;
        INIT_LIST_HEAD(&p->children);
        INIT_LIST_HEAD(&p->sibling);
        rcu_copy_process(p);

An alternative would be to not use p->flags at all, but instead make
this a property of the signal_struct - since the property is shared by
all threads, that might make more sense?

^ permalink raw reply related

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Christian Brauner @ 2019-07-24 18:27 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, Oleg Nesterov, Arnd Bergmann, Eric W. Biederman,
	Kees Cook, Joel Fernandes (Google), Thomas Gleixner, Tejun Heo,
	David Howells, Andy Lutomirski, Andrew Morton, Aleksa Sarai,
	Linus Torvalds, Al Viro, kernel-team, Ingo Molnar, Peter Zijlstra,
	Linux API
In-Reply-To: <CAG48ez3nuY__qvctoOnX7mQbjjP4chEs4K-OPxSQficiPLS18w@mail.gmail.com>

On July 24, 2019 8:14:26 PM GMT+02:00, Jann Horn <jannh@google.com> wrote:
>On Wed, Jul 24, 2019 at 4:48 PM Christian Brauner
><christian@brauner.io> wrote:
>> If CLONE_WAIT_PID is set the newly created process will not be
>> considered by process wait requests that wait generically on children
>> such as:
>>
>>         syscall(__NR_wait4, -1, wstatus, options, rusage)
>>         syscall(__NR_waitpid, -1, wstatus, options)
>>         syscall(__NR_waitid, P_ALL, -1, siginfo, options, rusage)
>>         syscall(__NR_waitid, P_PGID, -1, siginfo, options, rusage)
>>         syscall(__NR_waitpid, -pid, wstatus, options)
>>         syscall(__NR_wait4, -pid, wstatus, options, rusage)
>>
>> A process created with CLONE_WAIT_PID can only be waited upon with a
>> focussed wait call. This ensures that processes can be reaped even if
>> all file descriptors referring to it are closed.
>[...]
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index baaff6570517..a067f3876e2e 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -1910,6 +1910,8 @@ static __latent_entropy struct task_struct
>*copy_process(
>>         delayacct_tsk_init(p);  /* Must remain after
>dup_task_struct() */
>>         p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
>>         p->flags |= PF_FORKNOEXEC;
>> +       if (clone_flags & CLONE_WAIT_PID)
>> +               p->flags |= PF_WAIT_PID;
>>         INIT_LIST_HEAD(&p->children);
>>         INIT_LIST_HEAD(&p->sibling);
>>         rcu_copy_process(p);
>
>This means that if a process with PF_WAIT_PID forks, the child
>inherits the flag, right? That seems unintended? You might have to add
>something like "if (clone_flags & CLONE_THREAD == 0) p->flags &=
>~PF_WAIT_PID;" before this. (I think threads do have to inherit the
>flag so that the case where a non-leader thread of the child goes
>through execve and steals the leader's identity is handled properly.)
>Or you could cram it somewhere into signal_struct instead of on the
>task - that might be a more logical place for it?

Hm, CLONE_WAIT_PID is only useable with CLONE_PIDFD which in turn is
not useable with CLONE_THREAD.
But we should probably make that explicit for CLONE_WAIT_PID too.

^ permalink raw reply

* Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID
From: Jann Horn @ 2019-07-24 18:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kernel list, Oleg Nesterov, Arnd Bergmann, Eric W. Biederman,
	Kees Cook, Joel Fernandes (Google), Thomas Gleixner, Tejun Heo,
	David Howells, Andy Lutomirski, Andrew Morton, Aleksa Sarai,
	Linus Torvalds, Al Viro, kernel-team, Ingo Molnar, Peter Zijlstra,
	Linux API
In-Reply-To: <20190724144651.28272-5-christian@brauner.io>

On Wed, Jul 24, 2019 at 4:48 PM Christian Brauner <christian@brauner.io> wrote:
> If CLONE_WAIT_PID is set the newly created process will not be
> considered by process wait requests that wait generically on children
> such as:
>
>         syscall(__NR_wait4, -1, wstatus, options, rusage)
>         syscall(__NR_waitpid, -1, wstatus, options)
>         syscall(__NR_waitid, P_ALL, -1, siginfo, options, rusage)
>         syscall(__NR_waitid, P_PGID, -1, siginfo, options, rusage)
>         syscall(__NR_waitpid, -pid, wstatus, options)
>         syscall(__NR_wait4, -pid, wstatus, options, rusage)
>
> A process created with CLONE_WAIT_PID can only be waited upon with a
> focussed wait call. This ensures that processes can be reaped even if
> all file descriptors referring to it are closed.
[...]
> diff --git a/kernel/fork.c b/kernel/fork.c
> index baaff6570517..a067f3876e2e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1910,6 +1910,8 @@ static __latent_entropy struct task_struct *copy_process(
>         delayacct_tsk_init(p);  /* Must remain after dup_task_struct() */
>         p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
>         p->flags |= PF_FORKNOEXEC;
> +       if (clone_flags & CLONE_WAIT_PID)
> +               p->flags |= PF_WAIT_PID;
>         INIT_LIST_HEAD(&p->children);
>         INIT_LIST_HEAD(&p->sibling);
>         rcu_copy_process(p);

This means that if a process with PF_WAIT_PID forks, the child
inherits the flag, right? That seems unintended? You might have to add
something like "if (clone_flags & CLONE_THREAD == 0) p->flags &=
~PF_WAIT_PID;" before this. (I think threads do have to inherit the
flag so that the case where a non-leader thread of the child goes
through execve and steals the leader's identity is handled properly.)
Or you could cram it somewhere into signal_struct instead of on the
task - that might be a more logical place for it?

^ permalink raw reply

* Re: [PATCH 2/5] pidfd: add pidfd_wait()
From: Christian Brauner @ 2019-07-24 17:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux List Kernel Mailing, Oleg Nesterov, Arnd Bergmann,
	Eric W. Biederman, Kees Cook, Joel Fernandes, Thomas Gleixner,
	Tejun Heo, David Howells, Jann Horn, Andrew Lutomirski,
	Andrew Morton, Aleksa Sarai, Al Viro, Android Kernel Team,
	Linux API
In-Reply-To: <95CD0533-576F-4B3A-8E80-D3D89967EE2C@brauner.io>

On July 24, 2019 7:50:49 PM GMT+02:00, Christian Brauner <christian@brauner.io> wrote:
>On July 24, 2019 7:45:38 PM GMT+02:00, Linus Torvalds
><torvalds@linux-foundation.org> wrote:
>>On Wed, Jul 24, 2019 at 7:47 AM Christian Brauner
>><christian@brauner.io> wrote:
>>>
>>> This adds the pidfd_wait() syscall.
>>
>>I despise this patch.
>>
>>Why can't this just be a new P_PIDFD flag, and then use
>>"waitid(P_PIDFD, pidfd, ...);"
>>
>>Yes, yes, yes, I realize that "pidfd" is of type "int", and waitid()
>>takes an argument of type pid_t, but it's the same type in the end,
>>and it does seem like the whole *point* of "waitid()" is that
>>"idtype_t idtype" which tells you what kind of ID you're passing it.
>>
>>               Linus
>
>Well in that case we could add P_PIDFD.
>But then I would like to _only_ enable it for waitid(). How's that
>sound?
>
>Christian

Ah, sorry, just saw that that's what you suggested.

Christian

^ permalink raw reply

* Re: [PATCH 2/5] pidfd: add pidfd_wait()
From: Christian Brauner @ 2019-07-24 17:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux List Kernel Mailing, Oleg Nesterov, Arnd Bergmann,
	Eric W. Biederman, Kees Cook, Joel Fernandes, Thomas Gleixner,
	Tejun Heo, David Howells, Jann Horn, Andrew Lutomirski,
	Andrew Morton, Aleksa Sarai, Al Viro, Android Kernel Team,
	Linux API
In-Reply-To: <CAHk-=whZPKzbPQftNGFB=iaSZGTSXNkhUASWF2V53MwB+A4zAQ@mail.gmail.com>

On July 24, 2019 7:45:38 PM GMT+02:00, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Wed, Jul 24, 2019 at 7:47 AM Christian Brauner
><christian@brauner.io> wrote:
>>
>> This adds the pidfd_wait() syscall.
>
>I despise this patch.
>
>Why can't this just be a new P_PIDFD flag, and then use
>"waitid(P_PIDFD, pidfd, ...);"
>
>Yes, yes, yes, I realize that "pidfd" is of type "int", and waitid()
>takes an argument of type pid_t, but it's the same type in the end,
>and it does seem like the whole *point* of "waitid()" is that
>"idtype_t idtype" which tells you what kind of ID you're passing it.
>
>               Linus

Well in that case we could add P_PIDFD.
But then I would like to _only_ enable it for waitid(). How's that sound?

Christian

^ permalink raw reply

* Re: [PATCH 2/5] pidfd: add pidfd_wait()
From: Linus Torvalds @ 2019-07-24 17:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linux List Kernel Mailing, Oleg Nesterov, Arnd Bergmann,
	Eric W. Biederman, Kees Cook, Joel Fernandes, Thomas Gleixner,
	Tejun Heo, David Howells, Jann Horn, Andrew Lutomirski,
	Andrew Morton, Aleksa Sarai, Al Viro, Android Kernel Team,
	Linux API
In-Reply-To: <20190724144651.28272-3-christian@brauner.io>

On Wed, Jul 24, 2019 at 7:47 AM Christian Brauner <christian@brauner.io> wrote:
>
> This adds the pidfd_wait() syscall.

I despise this patch.

Why can't this just be a new P_PIDFD flag, and then use
"waitid(P_PIDFD, pidfd, ...);"

Yes, yes, yes, I realize that "pidfd" is of type "int", and waitid()
takes an argument of type pid_t, but it's the same type in the end,
and it does seem like the whole *point* of "waitid()" is that
"idtype_t idtype" which tells you what kind of ID you're passing it.

               Linus

^ permalink raw reply

* Re: [PATCH v4 0/3] initramfs: add support for xattrs in the initial ram disk
From: Roberto Sassu @ 2019-07-24 15:34 UTC (permalink / raw)
  To: Rob Landley, hpa, Arvind Sankar
  Cc: Mimi Zohar, viro, linux-security-module, linux-integrity,
	initramfs, linux-api, linux-fsdevel, linux-kernel, bug-cpio,
	zohar, silviu.vlasceanu, dmitry.kasatkin, takondra, kamensky,
	arnd, james.w.mcmechan
In-Reply-To: <f85ed711-f583-51cd-34e2-80018a592280@huawei.com>

Is there anything I didn't address in this patch set, that is delaying
the review? I would appreciate if you can give me a feedback, positive
or negative.

Thanks a lot!

Roberto


On 7/15/2019 6:54 PM, Roberto Sassu wrote:
> Rob, Peter, Arvind, did you have the chance to have a look at this
> version of the patch set?
> 
> Thanks
> 
> Roberto
> 
> 
> On 7/1/2019 4:31 PM, Mimi Zohar wrote:
>> On Mon, 2019-07-01 at 16:42 +0300, Roberto Sassu wrote:
>>> On 6/30/2019 6:39 PM, Mimi Zohar wrote:
>>>> On Wed, 2019-06-26 at 10:15 +0200, Roberto Sassu wrote:
>>>>> On 6/3/2019 8:32 PM, Rob Landley wrote:
>>>>>> On 6/3/19 4:31 AM, Roberto Sassu wrote:
>>>>>>>> This patch set aims at solving the following use case: appraise 
>>>>>>>> files from
>>>>>>>> the initial ram disk. To do that, IMA checks the signature/hash 
>>>>>>>> from the
>>>>>>>> security.ima xattr. Unfortunately, this use case cannot be 
>>>>>>>> implemented
>>>>>>>> currently, as the CPIO format does not support xattrs.
>>>>>>>>
>>>>>>>> This proposal consists in including file metadata as additional 
>>>>>>>> files named
>>>>>>>> METADATA!!!, for each file added to the ram disk. The CPIO 
>>>>>>>> parser in the
>>>>>>>> kernel recognizes these special files from the file name, and 
>>>>>>>> calls the
>>>>>>>> appropriate parser to add metadata to the previously extracted 
>>>>>>>> file. It has
>>>>>>>> been proposed to use bit 17:16 of the file mode as a way to 
>>>>>>>> recognize files
>>>>>>>> with metadata, but both the kernel and the cpio tool declare the 
>>>>>>>> file mode
>>>>>>>> as unsigned short.
>>>>>>>
>>>>>>> Any opinion on this patch set?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Roberto
>>>>>>
>>>>>> Sorry, I've had the window open since you posted it but haven't 
>>>>>> gotten around to
>>>>>> it. I'll try to build it later today.
>>>>>>
>>>>>> It does look interesting, and I have no objections to the basic 
>>>>>> approach. I
>>>>>> should be able to add support to toybox cpio over a weekend once 
>>>>>> I've got the
>>>>>> kernel doing it to test against.
>>>>>
>>>>> Ok.
>>>>>
>>>>> Let me give some instructions so that people can test this patch set.
>>>>>
>>>>> To add xattrs to the ram disk embedded in the kernel it is sufficient
>>>>> to set CONFIG_INITRAMFS_FILE_METADATA="xattr" and
>>>>> CONFIG_INITRAMFS_SOURCE="<file with xattr>" in the kernel 
>>>>> configuration.
>>>>>
>>>>> To add xattrs to the external ram disk, it is necessary to patch cpio:
>>>>>
>>>>> https://github.com/euleros/cpio/commit/531cabc88e9ecdc3231fad6e4856869baa9a91ef 
>>>>>
>>>>> (xattr-v1 branch)
>>>>>
>>>>> and dracut:
>>>>>
>>>>> https://github.com/euleros/dracut/commit/a2dee56ea80495c2c1871bc73186f7b00dc8bf3b 
>>>>>
>>>>> (digest-lists branch)
>>>>>
>>>>> The same modification can be done for mkinitramfs (add '-e xattr' 
>>>>> to the
>>>>> cpio command line).
>>>>>
>>>>> To simplify the test, it would be sufficient to replace only the cpio
>>>>> binary and the dracut script with the modified versions. For 
>>>>> dracut, the
>>>>> patch should be applied to the local dracut (after it has been renamed
>>>>> to dracut.sh).
>>>>>
>>>>> Then, run:
>>>>>
>>>>> dracut -e xattr -I <file with xattr> (add -f to overwrite the ram 
>>>>> disk)
>>>>>
>>>>> Xattrs can be seen by stopping the boot process for example by adding
>>>>> rd.break to the kernel command line.
>>>>
>>>> A simple way of testing, without needing any changes other than the
>>>> kernel patches, is to save the dracut temporary directory by supplying
>>>> "--keep" on the dracut command line, calling
>>>> usr/gen_initramfs_list.sh, followed by usr/gen_init_cpio with the "-e
>>>> xattr" option.
>>>
>>> Alternatively, follow the instructions to create the embedded ram disk
>>> with xattrs, and use the existing external ram disk created with dracut
>>> to check if xattrs are created.
>>
>> True, but this alternative is for those who normally use dracut to
>> create an initramfs, but don't want to update cpio or dracut.
>>
>> Mimi
>>
> 

-- 
HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Li Jian, Shi Yanli

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox