All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Demi Marie Obenour <demiobenour@gmail.com>,
	Mateusz Guzik <mjguzik@gmail.com>,
	Christian Brauner <brauner@kernel.org>
Cc: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] kernel: Prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
Date: Tue, 23 Sep 2025 14:03:45 +0200	[thread overview]
Message-ID: <20250923120344.GA12377@redhat.com> (raw)
In-Reply-To: <20250922154819.c3049158ca006e1561ff5dcb@linux-foundation.org>

Hi Demi,

(add more CC's)

I think your patch is correct in that it fixes the problems described in
the changelog. Thanks for the detailed explanations.

But I obviously dislike the fact it adds read_lock(tasklist_lock) into
prctl(PR_SET_PDEATHSIG) ;)

For the moment lets forget about the data race. Of course you right, but
this is simple, we can add READ/WRITE_ONCE(pdeath_signal).

Lets only discuss the 2nd problem: the exiting parent can miss
child->pdeath_signal != 0 while the child can still see the old getppid().
I honestly do not know if we really care, I always thought that this API
is ancient/obsolete and "strange". If nothing else, pdeath_signal is sent
even in the case of a threaded reparent...

But OK, lets suppose we need to fix this problem. Not that it matters, but
I guess you didn't hit it in practice?

As you correctly pointed out, forget_original_parent/prctl lack the necessary
barries. So lets add the barriers instead of abusing tasklist? As for sys_prctl(),
I think that ret-to-user-mode + enter-the-kernel-mode should act as a full
barrier, so it only needs WRITE_ONCE()...

Or perhaps user-space can do something else to sync with the exiting parent
instead of using getppid() ?

Oleg.

On 09/22, Andrew Morton wrote:
>
> From: Demi Marie Obenour <demiobenour@gmail.com>
> Subject: kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
> Date: Sat, 13 Sep 2025 18:28:49 -0400
>
> If a process calls prctl(PR_SET_PDEATHSIG) at the same time that the
> parent process exits, the child will write to me->pdeath_sig at the same
> time the parent is reading it.  Since there is no synchronization, this is
> a data race.
>
> Worse, it is possible that a subsequent call to getppid() can continue to
> return the previous parent process ID without the parent death signal
> being delivered.  This happens in the following scenario:
>
> parent                                                 child
>
> forget_original_parent()                               prctl(PR_SET_PDEATHSIG, SIGKILL)
>                                                          sys_prctl()
>                                                            me->pdeath_sig = SIGKILL;
>                                                        getppid();
>   RCU_INIT_POINTER(t->real_parent, reaper);
>   if (t->pdeath_signal) /* reads stale me->pdeath_sig */
>            group_send_sig_info(t->pdeath_signal, ...);
>
> And in the following:
>
> parent                                                 child
>
> forget_original_parent()
>     RCU_INIT_POINTER(t->real_parent, reaper);
>     /* also no barrier */
>      if (t->pdeath_signal) /* reads stale me->pdeath_sig */
>              group_send_sig_info(t->pdeath_signal, ...);
>
>                                                        prctl(PR_SET_PDEATHSIG, SIGKILL)
>                                                          sys_prctl()
>                                                            me->pdeath_sig = SIGKILL;
>                                                        getppid(); /* reads old ppid() */
>
> As a result, the following pattern is racy:
>
> 	pid_t parent_pid = getpid();
> 	pid_t child_pid = fork();
> 	if (child_pid == -1) {
> 		/* handle error... */
> 		return;
> 	}
> 	if (child_pid == 0) {
> 		if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
> 			/* handle error */
> 			_exit(126);
> 		}
> 		if (getppid() != parent_pid) {
> 			/* parent died already */
> 			raise(SIGKILL);
> 		}
> 		/* keep going in child */
> 	}
> 	/* keep going in parent */
>
> If the parent is killed at exactly the wrong time, the child process can
> (wrongly) stay running.
>
> I didn't manage to reproduce this in my testing, but I'm pretty sure the
> race is real.  KCSAN is probably the best way to spot the race.
>
> Fix the bug by holding tasklist_lock for reading whenever pdeath_signal is
> being written to.  This prevents races on me->pdeath_sig, and the locking
> and unlocking of the rwlock provide the needed memory barriers.  If
> prctl(PR_SET_PDEATHSIG) happens before the parent exits, the signal will
> be sent.  If it happens afterwards, a subsequent getppid() will return the
> new value.
>
> Link: https://lkml.kernel.org/r/20250913-fix-prctl-pdeathsig-race-v1-1-44e2eb426fe9@gmail.com
> Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  kernel/sys.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> --- a/kernel/sys.c~kernel-prevent-prctlpr_set_pdeathsig-from-racing-with-parent-process-exit
> +++ a/kernel/sys.c
> @@ -2533,7 +2533,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
>  			error = -EINVAL;
>  			break;
>  		}
> +		/*
> +		 * Ensure that either:
> +		 *
> +		 * 1. Subsequent getppid() calls reflect the parent process having died.
> +		 * 2. forget_original_parent() will send the new me->pdeath_signal.
> +		 *
> +		 * Also prevent the read of me->pdeath_signal from being a data race.
> +		 */
> +		read_lock(&tasklist_lock);
>  		me->pdeath_signal = arg2;
> +		read_unlock(&tasklist_lock);
>  		break;
>  	case PR_GET_PDEATHSIG:
>  		error = put_user(me->pdeath_signal, (int __user *)arg2);
> _
>


  reply	other threads:[~2025-09-23 12:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-13 22:28 [PATCH] kernel: Prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit Demi Marie Obenour
2025-09-13 22:28 ` Demi Marie Obenour via B4 Relay
2025-09-20  4:10 ` Demi Marie Obenour
2025-09-22 22:48   ` Andrew Morton
2025-09-23 12:03     ` Oleg Nesterov [this message]
2025-09-23 13:39       ` Mateusz Guzik
2025-09-25 16:28         ` Oleg Nesterov
2025-09-25 18:35           ` Mateusz Guzik
2025-09-25 18:50             ` Oleg Nesterov
2025-09-26 23:58             ` Demi Marie Obenour
2025-09-27  1:51               ` Mateusz Guzik
2025-10-07 12:53         ` Christian Brauner
2025-09-20  4:10 ` Demi Marie Obenour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250923120344.GA12377@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=demiobenour@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.