From: Thomas Gleixner <tglx@linutronix.de>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Zhang Qiao <zhangqiao22@huawei.com>,
lkml <linux-kernel@vger.kernel.org>,
keescook@chromium.org, Peter Zijlstra <peterz@infradead.org>,
elver@google.com, legion@kernel.org, oleg@redhat.com,
brauner@kernel.org
Subject: Re: Question about kill a process group
Date: Thu, 12 May 2022 00:53:16 +0200 [thread overview]
Message-ID: <87bkw3y943.ffs@tglx> (raw)
In-Reply-To: <87a6bnudfy.fsf@email.froward.int.ebiederm.org>
On Wed, May 11 2022 at 13:33, Eric W. Biederman wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
>> So unless the number of PIDs for a user is limited this _is_ an
>> unpriviledged DoS vector.
>
> After having slept on this a bit it finally occurred to me the
> semi-obvious solution to this issue is to convert tasklist_lock
> from a rw-spinlock to rw-semaphore. The challenge is finding
> the users (tty layer?) that generate signals from interrupt
> context and redirect that signal generation.
From my outdated notes where I looked at this before:
[soft]interrupt context which acquires tasklist_lock:
sysrq-e send_sig_all()
sysrq-i send_sig_all()
sysrq-n normalize_rt_tasks()
tasklist_lock nesting into other locks:
fs/fcntl.c: send_sigio(), send_sigurg()
send_sigurg() is called from the network stack ...
Some very obscure stuff in arch/ia64/kernel/mca.c which is called
from a DIE notifier.
Plus quite a bunch of read_lock() instances which nest inside
rcu_read_lock() held sections.
This is probably incomplete, but the scope of the problem has been
greatly reduced vs. the point where I looked at it last time a couple of
years ago. But that's still a herculean task.
> Once signals holding tasklist_lock are no longer generated from
> interrupt context irqs no longer need to be disabled and
> after verifying tasklist_lock isn't held under any other spinlocks
> it can be converted to a semaphore.
Going to take a while. :)
> It won't help the signal delivery times, but it should reduce
> the effect on the rest of the system, and prevent watchdogs from
> firing.
The signal delivery time itself is the least of the worries, but this
still prevents any other operations which require tasklist_lock from
making progress for quite some time, i.e. fork/exec for unrelated
processes/users will have to wait too. So you converted the 'visible'
DoS to an 'invisible' one.
The real problem is that the scope of tasklist_lock is too broad for
most use cases. That does not change when you actually can convert it to
a rwsem. The underlying problem still persists.
Let's take a step back and look what most sane use cases (sysrq-* is not
in that category) require:
Preventing that tasks are added or removed
Do they require that any task is added or removed? No.
They require to prevent add/remove for the intended scope.
That's the thing we need to focus on: reducing the protection scope.
If we can segment the protection for the required scope of e.g. kill(2)
then we still can let unrelated processes/tasks make progress and just
inflict the damage on the affected portion of processes/tasks.
For example:
read_lock(&tasklist_lock);
for_each_process(p) {
if (task_pid_vnr(p) > 1 &&
!same_thread_group(p, current)) {
group_send_sig_info(...., p);
}
}
read_unlock(&tasklist_lock);
same_thread_group() does:
return p->signal == current->signal;
Ideally we can do:
read_lock(&tasklist_lock);
prevent_add_remove(current->signal);
read_unlock(&tasklist_lock);
rcu_read_lock();
for_each_process(p) {
if (task_pid_vnr(p) > 1 &&
!same_thread_group(p, current)) {
group_send_sig_info(...., p);
}
}
rcu_read_unlock();
allow_add_remove(current->signal);
Where prevent_add_remove() sets a state which has to be waited for to be
cleared by anything which wants to add/remove a task in that scope or
change $relatedtask->signal until allow_add_remove() removes that
blocker. I'm sure it's way more complicated, but you get the idea.
If we find a solution to this scope reduction problem, then it will not
only squash the issue which started this discussion. This will have a
benefit in general.
Thanks,
tglx
next prev parent reply other threads:[~2022-05-11 22:53 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-29 8:07 Question about kill a process group Zhang Qiao
2022-04-02 2:22 ` Zhang Qiao
2022-04-13 1:56 ` Zhang Qiao
2022-04-13 15:47 ` Eric W. Biederman
2022-04-14 11:40 ` Zhang Qiao
2022-04-21 16:12 ` Eric W. Biederman
2022-04-28 2:05 ` Zhang Qiao
2022-04-28 12:33 ` Thomas Gleixner
2022-05-11 18:33 ` Eric W. Biederman
2022-05-11 22:53 ` Thomas Gleixner [this message]
2022-05-12 18:23 ` Eric W. Biederman
2022-09-26 7:32 ` Zhang Qiao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bkw3y943.ffs@tglx \
--to=tglx@linutronix.de \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=elver@google.com \
--cc=keescook@chromium.org \
--cc=legion@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=zhangqiao22@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.