From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E72E2EDD72; Tue, 11 Nov 2025 09:22:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762852933; cv=none; b=H+kZANSjTbztwhag3Jq/FnuVwkZ8f77Tk6SNHX6JxN2MpAYlXefW8gEgpmVX+AeU+5felhayazQo8PuORLF6TGPkdWX6T7t89oxCnBYjCgjlxtTheT+fZZKvMh/S4yaKW43tichbnUxFg2GWgMeepqhU6yt25ouABgD0qIOqk1c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762852933; c=relaxed/simple; bh=YKxB1Tk0c8PA5NVe6KSLcg4q1RUhiINUvY7b/ZMmRUw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=qg+iqOb2Hbu/X4vfTdGV4HbZTyn6Ut0aPCXzCYA/4IXG2FoAcipgv6G782KoAmvgdZS1MQc7zHeGfVRsU4gjv8RPXI/NFjX41cbFln4B+YFFc2+n2zmKVinCaRNLfw0msQ0Xz04OJjuT6RGk/QbSWlVwMK/2iXLGm6AMBf9YFsg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Iq/i8hPW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Iq/i8hPW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 45B43C19422; Tue, 11 Nov 2025 09:22:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762852932; bh=YKxB1Tk0c8PA5NVe6KSLcg4q1RUhiINUvY7b/ZMmRUw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Iq/i8hPWgHTwu435PI+UGZlLN/X3lyfBrrm+1OwNKieirVMDPmDCkCaFA5KX+Cpoo jCY0uq+bKxbfSS/6qRug9ACZVWvSIG2hb05Nfywkq6thFqPc80S2JztzaJqBCtX5TK 6ankYgSWSaELPVPWooj+5cT1eMoRBXuIxw8R7wm0f3aC08zRAHgGqQPhTXlh+50TBr zxjneSB775FAK9WKt6FxPW2yAZcqlXQ0OEYW8MPlkyO1msbbTBkJLfLsWoPQX0KMwY 2kPRjg5gUXXR3fF2VCX329Z0qynUJi5o5aKfiv7nNQoaN9iGP9xwUFoDOYr6HTaRw8 daoW/qmbnCCrQ== Date: Tue, 11 Nov 2025 10:21:58 +0100 From: Christian Brauner To: Oleg Nesterov Cc: Bernd Edlinger , Alexander Viro , Alexey Dobriyan , Kees Cook , Andy Lutomirski , Will Drewry , Andrew Morton , Michal Hocko , Serge Hallyn , James Morris , Randy Dunlap , Suren Baghdasaryan , Yafang Shao , Helge Deller , "Eric W. Biederman" , Adrian Reber , Thomas Gleixner , Jens Axboe , Alexei Starovoitov , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, tiozhang , Luis Chamberlain , "Paulo Alcantara (SUSE)" , Sergey Senozhatsky , Frederic Weisbecker , YueHaibing , Paul Moore , Aleksa Sarai , Stefan Roesch , Chao Yu , xu xin , Jeff Layton , Jan Kara , David Hildenbrand , Dave Chinner , Shuah Khan , Elena Reshetova , David Windsor , Mateusz Guzik , Ard Biesheuvel , "Joel Fernandes (Google)" , "Matthew Wilcox (Oracle)" , Hans Liljestrand , Penglei Jiang , Lorenzo Stoakes , Adrian Ratiu , Ingo Molnar , "Peter Zijlstra (Intel)" , Cyrill Gorcunov , Eric Dumazet Subject: Re: [PATCH v17] exec: Fix dead-lock in de_thread with ptrace_attach Message-ID: <20251111-ankreiden-augen-eadcf9bbdfaa@brauner> References: <20251105143210.GA25535@redhat.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20251105143210.GA25535@redhat.com> On Wed, Nov 05, 2025 at 03:32:10PM +0100, Oleg Nesterov wrote: > I am still thinking about another approach, will write another email. > But let me take a closer look at your patch. > > First of all, can you split it? See below. > > On 08/21, Bernd Edlinger wrote: > > > > -static int de_thread(struct task_struct *tsk) > > +static int de_thread(struct task_struct *tsk, struct linux_binprm *bprm) > > { > > struct signal_struct *sig = tsk->signal; > > struct sighand_struct *oldsighand = tsk->sighand; > > spinlock_t *lock = &oldsighand->siglock; > > + struct task_struct *t; > > + bool unsafe_execve_in_progress = false; > > > > if (thread_group_empty(tsk)) > > goto no_thread_group; > > @@ -932,6 +934,19 @@ static int de_thread(struct task_struct *tsk) > > if (!thread_group_leader(tsk)) > > sig->notify_count--; > > > > + for_other_threads(tsk, t) { > > + if (unlikely(t->ptrace) > > + && (t != tsk->group_leader || !t->exit_state)) > > + unsafe_execve_in_progress = true; > > you can add "break" into the "if ()" block... > > But this is minor. Why do we need "bool unsafe_execve_in_progress" ? > If this patch is correct, de_thread() can drop/reacquire cred_guard_mutex > unconditionally. > > If you really think it makes sense, please make another patch with the > changelog. > > I'd certainly prefer to avoid this boolean at least for the start. If nothing > else to catch the potential problems earlier. > > > + if (unlikely(unsafe_execve_in_progress)) { > > + spin_unlock_irq(lock); > > + sig->exec_bprm = bprm; > > + mutex_unlock(&sig->cred_guard_mutex); > > + spin_lock_irq(lock); > > I don't think spin_unlock_irq() + spin_lock_irq() makes any sense... > > > @@ -1114,13 +1139,31 @@ int begin_new_exec(struct linux_binprm * bprm) > > */ > > trace_sched_prepare_exec(current, bprm); > > > > + /* If the binary is not readable then enforce mm->dumpable=0 */ > > + would_dump(bprm, bprm->file); > > + if (bprm->have_execfd) > > + would_dump(bprm, bprm->executable); > > + > > + /* > > + * Figure out dumpability. Note that this checking only of current > > + * is wrong, but userspace depends on it. This should be testing > > + * bprm->secureexec instead. > > + */ > > + if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || > > + is_dumpability_changed(current_cred(), bprm->cred) || > > + !(uid_eq(current_euid(), current_uid()) && > > + gid_eq(current_egid(), current_gid()))) > > + set_dumpable(bprm->mm, suid_dumpable); > > + else > > + set_dumpable(bprm->mm, SUID_DUMP_USER); > > + > > OK, we need to do this before de_thread() drops cred_guard_mutex. > But imo this too should be done in a separate patch, the changelog should > explain this change. > > > @@ -1361,6 +1387,11 @@ static int prepare_bprm_creds(struct linux_binprm *bprm) > > if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex)) > > return -ERESTARTNOINTR; > > > > + if (unlikely(current->signal->exec_bprm)) { > > + mutex_unlock(¤t->signal->cred_guard_mutex); > > + return -ERESTARTNOINTR; > > + } > > OK, if signal->exec_bprm != NULL, then current is already killed. But > proc_pid_attr_write() and ptrace_traceme() do the same. So how about > something like > > int lock_current_cgm(void) > { > if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex)) > return -ERESTARTNOINTR; > > if (!current->signal->group_exec_task) > return 0; > > WARN_ON(!fatal_signal_pending(current)); > mutex_unlock(¤t->signal->cred_guard_mutex); > return -ERESTARTNOINTR; > } > > ? > > Note that it checks ->group_exec_task, not ->exec_bprm. So this change can > come in a separate patch too, but I won't insist. > > > @@ -453,6 +454,28 @@ static int ptrace_attach(struct task_struct *task, long request, > > return retval; > > } > > > > + if (unlikely(task == task->signal->group_exec_task)) { > > + retval = down_write_killable(&task->signal->exec_update_lock); > > + if (retval) > > + return retval; > > + > > + scoped_guard (task_lock, task) { > > + struct linux_binprm *bprm = task->signal->exec_bprm; > > + const struct cred __rcu *old_cred = task->real_cred; > > + struct mm_struct *old_mm = task->mm; > > + > > + rcu_assign_pointer(task->real_cred, bprm->cred); > > + task->mm = bprm->mm; > > + retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); > > + rcu_assign_pointer(task->real_cred, old_cred); > > + task->mm = old_mm; > > + } > > This is the most problematic change which I can't review... > > Firstly, it changes task->mm/real_cred for __ptrace_may_access() and this > looks dangerous to me. Yeah, that is not ok. This is effectively override_creds for real_cred and that is not a pattern I want to see us establish at all! Temporary credential overrides for the subjective credentials is already terrible but at least we have the explicit split between real_cred and cred expressely for that. So no, that's not an acceptable solution. > > Say, current_is_single_threaded() called by another CLONE_VM process can > miss group_exec_task and falsely return true. Probably not that bad, in > this case old_mm should go away soon, but still... > > And I don't know if this can fool the users of task_cred_xxx/__task_cred > somehow. > > Or. check_unsafe_exec() sets LSM_UNSAFE_PTRACE if ptrace. Is it safe to > ptrace the execing task after that? I have no idea what the security hooks > can do... > > Again, can't review this part. > > Oleg. >