Re: [PATCH v7 1/2] fork: extend clone3() to support setting a PID

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Adrian Reber <areber@redhat.com>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>,
	Eric Biederman <ebiederm@xmission.com>,
	Pavel Emelyanov <ovzxemul@gmail.com>,
	Jann Horn <jannh@google.com>, Oleg Nesterov <oleg@redhat.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	linux-kernel@vger.kernel.org, Andrei Vagin <avagin@gmail.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Radostin Stoyanov <rstoyanov1@gmail.com>
Subject: Re: [PATCH v7 1/2] fork: extend clone3() to support setting a PID
Date: Wed, 13 Nov 2019 09:02:25 +0100	[thread overview]
Message-ID: <20191113080225.GA1028126@dcbz.redhat.com> (raw)
In-Reply-To: <cc5f90b6-ea1f-dbdb-e713-cc0fceceafbe@rasmusvillemoes.dk>

On Mon, Nov 11, 2019 at 09:41:39PM +0100, Rasmus Villemoes wrote:
> On 11/11/2019 14.17, Adrian Reber wrote:
> > The main motivation to add set_tid to clone3() is CRIU.
> > 
> > To restore a process with the same PID/TID CRIU currently uses
> > /proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to
> > ns_last_pid and then (quickly) does a clone(). This works most of the
> > time, but it is racy. It is also slow as it requires multiple syscalls.
> > 
> > Extending clone3() to support *set_tid makes it possible restore a
> > process using CRIU without accessing /proc/sys/kernel/ns_last_pid and
> > race free (as long as the desired PID/TID is available).
> > 
> > This clone3() extension places the same restrictions (CAP_SYS_ADMIN)
> > on clone3() with *set_tid as they are currently in place for ns_last_pid.
> > 
> > The original version of this change was using a single value for
> > set_tid. At the 2019 LPC, after presenting set_tid, it was, however,
> > decided to change set_tid to an array to enable setting the PID of a
> > process in multiple PID namespaces at the same time. If a process is
> > created in a PID namespace it is possible to influence the PID inside
> > and outside of the PID namespace. Details also in the corresponding
> > selftest.
> > 
> 
> >  	/*
> >  	 * Verify that higher 32bits of exit_signal are unset and that
> >  	 * it is a valid signal
> > @@ -2556,8 +2561,17 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
> >  		.stack		= args.stack,
> >  		.stack_size	= args.stack_size,
> >  		.tls		= args.tls,
> > +		.set_tid	= kargs->set_tid,
> > +		.set_tid_size	= args.set_tid_size,
> >  	};
> 
> This is a bit ugly. And is it even well-defined? I mean, it's a bit
> similar to the "i = i++;". So it would be best to avoid.
> 
> > +	for (i = 0; i < args.set_tid_size; i++) {
> > +		if (copy_from_user(&kargs->set_tid[i],
> > +		    u64_to_user_ptr(args.set_tid + (i * sizeof(args.set_tid))),
> > +		    sizeof(pid_t)))
> > +			return -EFAULT;
> > +	}
> > +
> 
> If I'm reading this (and your test case) right, you expect the user
> pointer to point at an array of u64, and here you're copying the first
> half of each u64 to the pid_t array. That only works on little-endian.
> 
> It seems more obvious (since I don't think there's any disagreement
> anywhere on sizeof(pid_t)) to expect the user pointer to point at an
> array of pid_t and then simply copy_from_user() the whole thing in one go.

Yes, that was wrong. I changed the test case to use an array of pid_t.

		Adrian

     prev parent reply	other threads:[~2019-11-13  8:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-11 13:17 [PATCH v7 1/2] fork: extend clone3() to support setting a PID Adrian Reber
2019-11-11 13:17 ` [PATCH v7 2/2] selftests: add tests for clone3() Adrian Reber
2019-11-11 15:25 ` [PATCH v7 1/2] fork: extend clone3() to support setting a PID Oleg Nesterov
2019-11-11 15:40   ` Adrian Reber
2019-11-11 16:14     ` Christian Brauner
2019-11-11 16:32       ` Oleg Nesterov
2019-11-11 23:08       ` Eric W. Biederman
2019-11-12 10:24         ` Christian Brauner
2019-11-11 16:12 ` Christian Brauner
2019-11-11 20:41 ` Rasmus Villemoes
2019-11-12 15:26   ` Adrian Reber
2019-11-13  8:02   ` Adrian Reber [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191113080225.GA1028126@dcbz.redhat.com \
    --to=areber@redhat.com \
    --cc=0x7f454c46@gmail.com \
    --cc=avagin@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=oleg@redhat.com \
    --cc=ovzxemul@gmail.com \
    --cc=rppt@linux.ibm.com \
    --cc=rstoyanov1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.