From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oleg Nesterov Subject: Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace Date: Thu, 1 Nov 2018 15:48:05 +0100 Message-ID: <20181101144804.GD23232@redhat.com> References: <20181029224031.29809-1-tycho@tycho.ws> <20181029224031.29809-2-tycho@tycho.ws> <20181030143235.GA3385@redhat.com> <20181030153231.GB7343@cisco> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20181030153231.GB7343@cisco> Sender: linux-kernel-owner@vger.kernel.org To: Tycho Andersen Cc: Kees Cook , Andy Lutomirski , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Aleksa Sarai , linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org List-Id: linux-api@vger.kernel.org On 10/30, Tycho Andersen wrote: > > > I am not sure I understand the value of signaled/SECCOMP_NOTIF_FLAG_SIGNALED... > > I mean, why it is actually useful? > > > > Sorry if this was already discussed. > > :) no problem, many people have complained about this. This is an > implementation of Andy's suggestion here: > https://lkml.org/lkml/2018/3/15/1122 > > You can see some more detailed discussion here: > https://lkml.org/lkml/2018/9/21/138 Cough, sorry, I simply can't understand what are you talking about ;) It seems that I need to read all the previous emails... So let me ask a stupid question below. > > But my main concern is that either way wait_for_completion_killable() allows > > to trivially create a process which doesn't react to SIGSTOP, not good... > > > > Note also that this can happen if, say, both the tracer and tracee run in the > > same process group and SIGSTOP is sent to their pgid, if the tracer gets the > > signal first the tracee won't stop. > > > > Of freezer. try_to_freeze_tasks() can fail if it freezes the tracer before > > it does SECCOMP_IOCTL_NOTIF_SEND. > > I think in general the way this is intended to be used these things > wouldn't happen. Why? > was malicious and had the ability to create a user namespace to > exhaust pids this way, Not sure I understand how this connects to my question... nevermind. > so perhaps we should drop this part of the > patch. I have no real need for it, but perhaps Andy can elaborate? Yes I think it would be nice to avoid wait_for_completion_killable(). So please help me to understand the problem. Once again, why can not seccomp_do_user_notification() use wait_for_completion_interruptible() only? This is called before the task actually starts the syscall, so -ERESTARTNOINTR if signal_pending() can't hurt. Now lets suppose seccomp_do_user_notification() simply does err = wait_for_completion_interruptible(&n.ready); if (err < 0 && state != SECCOMP_NOTIFY_REPLIED) { syscall_set_return_value(ERESTARTNOINTR); list_del(&n.list); return -1; } (I am ignoring the locking/etc). Now the obvious problem is that the listener doing SECCOMP_IOCTL_NOTIF_SEND can't distinguish -ENOENT from the case when the tracee was killed, yes? Is it that important? Any other problem? Oleg.