From: Oleg Nesterov <oleg@redhat.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-kernel@vger.kernel.org, Ben Segall <bsegall@google.com>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH v2] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT.
Date: Mon, 4 Apr 2022 18:13:39 +0200 [thread overview]
Message-ID: <20220404161339.GA21531@redhat.com> (raw)
In-Reply-To: <YkW55u6u2fo5QmV7@linutronix.de>
Cough. Somehow I can hardly understand v2. For example, if we fix
wait_task_inactive() correctly, then why ptrace_freeze_traced()
should take saved_state into account? And how can unfreeze_traced
hit saved_state == __TASK_TRACED ?
I will read this patch tommorrow again, I have a fever today, quite
possibly I missed something...
But in any case, I think you need to update the changelog. Yes sure,
ptrace/wait_task_inactive doesn't play well if CONFIG_PREEMPT_RT,
but which exactly problem this patch tries to solve? and how?
And btw... What does the "Fix for ptrace_unfreeze_traced() by Oleg
Nesterov" below mean? This all looks as if there is something else
I am not aware of.
On 03/31, Sebastian Andrzej Siewior wrote:
>
> As explained by Alexander Fyodorov <halcy@yandex.ru>:
>
> |read_lock(&tasklist_lock) in ptrace_stop() is converted to sleeping
> |lock on a PREEMPT_RT kernel, and it can remove __TASK_TRACED from
> |task->__state (by moving it to task->saved_state). If parent does
> |wait() on child followed by a sys_ptrace call, the following race can
> |happen:
> |
> |- child sets __TASK_TRACED in ptrace_stop()
> |- parent does wait() which eventually calls wait_task_stopped() and returns
> | child's pid
> |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
> | __TASK_TRACED flag to saved_state
> |- parent calls sys_ptrace, which calls ptrace_check_attach() and
> | wait_task_inactive()
>
> The patch is based on his initial patch where an additional check is
> added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
> acquired to have stable view on ->__state and ->saved_state.
>
> wait_task_inactive() needs to check both task states while waiting for the
> expected task state. Should the expected task state be in ->saved_state then
> the task is blocked on a sleeping lock. In this case wait_task_inactive() needs
> to wait until the lock situtation has been resolved (the expected state is in
> ->__state). This ensures that the task is idle and does not wakeup as part of
> lock resolving and races for instance with __switch_to_xtra() while the
> debugger clears TIF_BLOCKSTEP() (noted by Oleg Nesterov).
>
> [ Fix for ptrace_unfreeze_traced() by Oleg Nesterov ]
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> v1…v2:
> - Use also ->saved_state in task_state_match_and_set().
> - Wait in wait_task_inactive() until the desired task state is in
> ->__state so that the task won't wake up a as part of lock
> resolving. Pointed out by Oleg Nesterov.
>
> include/linux/sched.h | 128 ++++++++++++++++++++++++++++++++++++++++++++++++--
> kernel/ptrace.c | 25 +++++----
> kernel/sched/core.c | 11 +++-
> 3 files changed, 146 insertions(+), 18 deletions(-)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -118,12 +118,8 @@ struct task_group;
>
> #define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING)
>
> -#define task_is_traced(task) ((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
> -
> #define task_is_stopped(task) ((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
>
> -#define task_is_stopped_or_traced(task) ((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
> -
> /*
> * Special states are those that do not use the normal wait-loop pattern. See
> * the comment with set_special_state().
> @@ -2009,6 +2005,130 @@ static inline int test_tsk_need_resched(
> return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED));
> }
>
> +#ifdef CONFIG_PREEMPT_RT
> +
> +static inline bool task_state_match_and(struct task_struct *tsk, long state)
> +{
> + unsigned long flags;
> + bool match = false;
> +
> + raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> + if (READ_ONCE(tsk->__state) & state)
> + match = true;
> + else if (tsk->saved_state & state)
> + match = true;
> + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags);
> + return match;
> +}
> +
> +static inline int __task_state_match_eq(struct task_struct *tsk, long state)
> +{
> + int match = 0;
> +
> + if (READ_ONCE(tsk->__state) == state)
> + match = 1;
> + else if (tsk->saved_state == state)
> + match = -1;
> +
> + return match;
> +}
> +
> +static inline int task_state_match_eq(struct task_struct *tsk, long state)
> +{
> + unsigned long flags;
> + int match;
> +
> + raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> + match = __task_state_match_eq(tsk, state);
> + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags);
> + return match;
> +}
> +
> +static inline bool task_state_match_and_set(struct task_struct *tsk, long state,
> + long new_state)
> +{
> + unsigned long flags;
> + bool match = false;
> +
> + raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> + if (READ_ONCE(tsk->__state) & state) {
> + WRITE_ONCE(tsk->__state, new_state);
> + match = true;
> + } else if (tsk->saved_state & state) {
> + tsk->saved_state = new_state;
> + match = true;
> + }
> + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags);
> + return match;
> +}
> +
> +static inline bool task_state_match_eq_set(struct task_struct *tsk, long state,
> + long new_state)
> +{
> + unsigned long flags;
> + bool match = false;
> +
> + raw_spin_lock_irqsave(&tsk->pi_lock, flags);
> + if (READ_ONCE(tsk->__state) == state) {
> + WRITE_ONCE(tsk->__state, new_state);
> + match = true;
> + } else if (tsk->saved_state == state) {
> + tsk->saved_state = new_state;
> + match = true;
> + }
> + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags);
> + return match;
> +}
> +
> +#else
> +
> +static inline bool task_state_match_and(struct task_struct *tsk, long state)
> +{
> + return READ_ONCE(tsk->__state) & state;
> +}
> +
> +static inline int __task_state_match_eq(struct task_struct *tsk, long state)
> +{
> + return READ_ONCE(tsk->__state) == state;
> +}
> +
> +static inline int task_state_match_eq(struct task_struct *tsk, long state)
> +{
> + return __task_state_match_eq(tsk, state);
> +}
> +
> +static inline bool task_state_match_and_set(struct task_struct *tsk, long state,
> + long new_state)
> +{
> + if (READ_ONCE(tsk->__state) & state) {
> + WRITE_ONCE(tsk->__state, new_state);
> + return true;
> + }
> + return false;
> +}
> +
> +static inline bool task_state_match_eq_set(struct task_struct *tsk, long state,
> + long new_state)
> +{
> + if (READ_ONCE(tsk->__state) == state) {
> + WRITE_ONCE(tsk->__state, new_state);
> + return true;
> + }
> + return false;
> +}
> +
> +#endif
> +
> +static inline bool task_is_traced(struct task_struct *tsk)
> +{
> + return task_state_match_and(tsk, __TASK_TRACED);
> +}
> +
> +static inline bool task_is_stopped_or_traced(struct task_struct *tsk)
> +{
> + return task_state_match_and(tsk, __TASK_STOPPED | __TASK_TRACED);
> +}
> +
> /*
> * cond_resched() and cond_resched_lock(): latency reduction via
> * explicit rescheduling in places that are safe. The return
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -195,10 +195,10 @@ static bool ptrace_freeze_traced(struct
> return ret;
>
> spin_lock_irq(&task->sighand->siglock);
> - if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
> - !__fatal_signal_pending(task)) {
> - WRITE_ONCE(task->__state, __TASK_TRACED);
> - ret = true;
> + if (!looks_like_a_spurious_pid(task) && !__fatal_signal_pending(task)) {
> +
> + ret = task_state_match_and_set(task, __TASK_TRACED,
> + __TASK_TRACED);
> }
> spin_unlock_irq(&task->sighand->siglock);
>
> @@ -207,7 +207,10 @@ static bool ptrace_freeze_traced(struct
>
> static void ptrace_unfreeze_traced(struct task_struct *task)
> {
> - if (READ_ONCE(task->__state) != __TASK_TRACED)
> + bool frozen;
> +
> + if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
> + READ_ONCE(task->__state) != __TASK_TRACED)
> return;
>
> WARN_ON(!task->ptrace || task->parent != current);
> @@ -217,12 +220,12 @@ static void ptrace_unfreeze_traced(struc
> * Recheck state under the lock to close this race.
> */
> spin_lock_irq(&task->sighand->siglock);
> - if (READ_ONCE(task->__state) == __TASK_TRACED) {
> - if (__fatal_signal_pending(task))
> - wake_up_state(task, __TASK_TRACED);
> - else
> - WRITE_ONCE(task->__state, TASK_TRACED);
> - }
> +
> + frozen = task_state_match_eq_set(task, __TASK_TRACED, TASK_TRACED);
> +
> + if (frozen && __fatal_signal_pending(task))
> + wake_up_state(task, __TASK_TRACED);
> +
> spin_unlock_irq(&task->sighand->siglock);
> }
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3219,6 +3219,8 @@ unsigned long wait_task_inactive(struct
> struct rq *rq;
>
> for (;;) {
> + int match_type = 0;
> +
> /*
> * We do the initial early heuristics without holding
> * any task-queue locks at all. We'll only try to get
> @@ -3239,7 +3241,8 @@ unsigned long wait_task_inactive(struct
> * is actually now running somewhere else!
> */
> while (task_running(rq, p)) {
> - if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> + if (match_state &&
> + unlikely(!task_state_match_eq(p, match_state)))
> return 0;
> cpu_relax();
> }
> @@ -3254,7 +3257,9 @@ unsigned long wait_task_inactive(struct
> running = task_running(rq, p);
> queued = task_on_rq_queued(p);
> ncsw = 0;
> - if (!match_state || READ_ONCE(p->__state) == match_state)
> + if (match_state)
> + match_type = __task_state_match_eq(p, match_state);
> + if (!match_state || match_type)
> ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
> task_rq_unlock(rq, p, &rf);
>
> @@ -3284,7 +3289,7 @@ unsigned long wait_task_inactive(struct
> * running right now), it's preempted, and we should
> * yield - it could be a while.
> */
> - if (unlikely(queued)) {
> + if (unlikely(queued || match_type < 0)) {
> ktime_t to = NSEC_PER_SEC / HZ;
>
> set_current_state(TASK_UNINTERRUPTIBLE);
next prev parent reply other threads:[~2022-04-04 21:23 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-02 21:04 [PATCH] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT Sebastian Andrzej Siewior
2022-03-14 9:27 ` Sebastian Andrzej Siewior
2022-03-14 18:54 ` Oleg Nesterov
2022-03-15 8:31 ` Sebastian Andrzej Siewior
2022-03-15 14:29 ` Oleg Nesterov
2022-03-16 8:23 ` Sebastian Andrzej Siewior
2022-03-31 14:25 ` [PATCH v2] " Sebastian Andrzej Siewior
2022-04-04 16:13 ` Oleg Nesterov [this message]
2022-04-05 8:34 ` Oleg Nesterov
2022-04-05 10:10 ` Peter Zijlstra
2022-04-05 10:29 ` Oleg Nesterov
2022-04-05 11:28 ` Peter Zijlstra
2022-04-07 12:13 ` Peter Zijlstra
2022-04-07 17:51 ` Eric W. Biederman
2022-04-07 22:50 ` Eric W. Biederman
2022-04-08 9:09 ` Peter Zijlstra
2022-04-08 19:40 ` Eric W. Biederman
2022-04-08 20:06 ` Peter Zijlstra
2022-04-11 11:35 ` Peter Zijlstra
2022-04-11 13:44 ` Eric W. Biederman
2022-04-11 17:07 ` Peter Zijlstra
2022-04-12 11:59 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220404161339.GA21531@redhat.com \
--to=oleg@redhat.com \
--cc=bigeasy@linutronix.de \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.