* [PATCH 1/1] signal: on exit skip waiting for an ack from the tracer if it is frozen
@ 2024-06-28 17:32 Suren Baghdasaryan
2024-06-29 13:12 ` Oleg Nesterov
0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2024-06-28 17:32 UTC (permalink / raw)
To: akpm
Cc: oleg, mhocko, brauner, tandersen, bigeasy, vincent.whitchurch,
ardb, linux-kernel, Suren Baghdasaryan, Martin Liu, Minchan Kim
When a process is being killed or exiting and it has a tracer, it will
notify the tracer and wait for an ack from the tracer to proceed. However
if the tracer is frozen, this ack will not arrive until the tracer gets
thawed. This poses a problem especially during memory pressure because
resources of the process are not released.
Things become even more interesting if OOM killer picks such tracee
and adds it into oom_victims. oom_victims counter will get incremented
and stay that way until tracee exits. In the meantime, if the system
tries to go into suspend, it will call oom_killer_disable() after
freezing all processes. That call will fail due to positive oom_victims,
but not until freeze_timeout_msecs passes. For the whole duration of the
freeze_timeout_msecs (20sec by default) the system will appear
unresponsive.
To fix this problem, skip the ack waiting step in the tracee when it's
exiting and the tracer is frozen. Per ptrace(2) manual, the tracer
cannot assume that the ptrace-stopped tracee exists. Therefore this
change does not break any valid assumptions.
Debugged-by: Martin Liu <liumartin@google.com>
Debugged-by: Minchan Kim <minchan@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
kernel/signal.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/signal.c b/kernel/signal.c
index 1f9dd41c04be..dd9c18fdaaa5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2320,6 +2320,19 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
do_notify_parent_cldstop(current, false, why);
+ /*
+ * If tracer is frozen, it won't ack until it gets unfrozen and if the
+ * tracee is exiting this means its resources do not get freed until
+ * the tracer is thawed. Skip waiting for the tracer. Per ptrace(2)
+ * manual, the tracer cannot assume that the ptrace-stopped tracee
+ * exists, so exiting now should not be an issue.
+ */
+ if (current->ptrace && (exit_code >> 8) == PTRACE_EVENT_EXIT &&
+ cgroup_task_frozen(current->parent)) {
+ read_unlock(&tasklist_lock);
+ goto skip_wait;
+ }
+
/*
* The previous do_notify_parent_cldstop() invocation woke ptracer.
* One a PREEMPTION kernel this can result in preemption requirement
@@ -2356,6 +2369,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
schedule();
cgroup_leave_frozen(true);
+skip_wait:
/*
* We are back. Now reacquire the siglock before touching
* last_siginfo, so that we are sure to have synchronized with
base-commit: 6c0483dbfe7223f2b8390e3d5fe942629d3317b7
--
2.45.2.803.g4e1b14247a-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] signal: on exit skip waiting for an ack from the tracer if it is frozen
2024-06-28 17:32 [PATCH 1/1] signal: on exit skip waiting for an ack from the tracer if it is frozen Suren Baghdasaryan
@ 2024-06-29 13:12 ` Oleg Nesterov
2024-06-30 19:12 ` Suren Baghdasaryan
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2024-06-29 13:12 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, mhocko, brauner, tandersen, bigeasy, vincent.whitchurch,
ardb, linux-kernel, Martin Liu, Minchan Kim
Oh, PTRACE_EVENT_EXIT again. I can't even recall how many times
I mentioned it is broken by design but any possible change is user
visible...
But I don't really understand this patch.
On 06/28, Suren Baghdasaryan wrote:
>
> When a process is being killed or exiting and it has a tracer, it will
> notify the tracer and wait for an ack from the tracer to proceed. However
> if the tracer is frozen, this ack will not arrive until the tracer gets
> thawed. This poses a problem especially during memory pressure because
> resources of the process are not released.
Yes. But how does this differ from situation when the tracer is not
frozen but just sleeps? Or it is traced too and its tracer is frozen?
> Things become even more interesting if OOM killer picks such tracee
> and adds it into oom_victims. oom_victims counter will get incremented
> and stay that way until tracee exits. In the meantime, if the system
> tries to go into suspend, it will call oom_killer_disable() after
> freezing all processes.
Confused... suspend doesn't use cgroup_freeze/etc, so it seems your
patch should check frozen() rather than cgroup_task_frozen() ?
And what if try_to_freeze_tasks() does freeze_task(tracee->parent)
right after the check in ptrace_stop() below?
I think it would better to simply change ptrace_stop() to check TIF_MEMDIE
along with __fatal_signal_pending() and return in this case.
Although TIF_MEMDIE is per-thread... perhaps signal->oom_mm != NULL?
Michal, what do you think?
Of course, this won't fix all problems.
Oleg.
> That call will fail due to positive oom_victims,
> but not until freeze_timeout_msecs passes. For the whole duration of the
> freeze_timeout_msecs (20sec by default) the system will appear
> unresponsive.
> To fix this problem, skip the ack waiting step in the tracee when it's
> exiting and the tracer is frozen. Per ptrace(2) manual, the tracer
> cannot assume that the ptrace-stopped tracee exists. Therefore this
> change does not break any valid assumptions.
>
> Debugged-by: Martin Liu <liumartin@google.com>
> Debugged-by: Minchan Kim <minchan@google.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> kernel/signal.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 1f9dd41c04be..dd9c18fdaaa5 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2320,6 +2320,19 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
> if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
> do_notify_parent_cldstop(current, false, why);
>
> + /*
> + * If tracer is frozen, it won't ack until it gets unfrozen and if the
> + * tracee is exiting this means its resources do not get freed until
> + * the tracer is thawed. Skip waiting for the tracer. Per ptrace(2)
> + * manual, the tracer cannot assume that the ptrace-stopped tracee
> + * exists, so exiting now should not be an issue.
> + */
> + if (current->ptrace && (exit_code >> 8) == PTRACE_EVENT_EXIT &&
> + cgroup_task_frozen(current->parent)) {
> + read_unlock(&tasklist_lock);
> + goto skip_wait;
> + }
> +
> /*
> * The previous do_notify_parent_cldstop() invocation woke ptracer.
> * One a PREEMPTION kernel this can result in preemption requirement
> @@ -2356,6 +2369,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
> schedule();
> cgroup_leave_frozen(true);
>
> +skip_wait:
> /*
> * We are back. Now reacquire the siglock before touching
> * last_siginfo, so that we are sure to have synchronized with
>
> base-commit: 6c0483dbfe7223f2b8390e3d5fe942629d3317b7
> --
> 2.45.2.803.g4e1b14247a-goog
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] signal: on exit skip waiting for an ack from the tracer if it is frozen
2024-06-29 13:12 ` Oleg Nesterov
@ 2024-06-30 19:12 ` Suren Baghdasaryan
2024-07-03 16:48 ` Oleg Nesterov
0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2024-06-30 19:12 UTC (permalink / raw)
To: Oleg Nesterov
Cc: akpm, mhocko, brauner, tandersen, bigeasy, vincent.whitchurch,
ardb, linux-kernel, Martin Liu, Minchan Kim
On Sat, Jun 29, 2024 at 6:14 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Oh, PTRACE_EVENT_EXIT again. I can't even recall how many times
> I mentioned it is broken by design but any possible change is user
> visible...
>
> But I don't really understand this patch.
>
> On 06/28, Suren Baghdasaryan wrote:
> >
> > When a process is being killed or exiting and it has a tracer, it will
> > notify the tracer and wait for an ack from the tracer to proceed. However
> > if the tracer is frozen, this ack will not arrive until the tracer gets
> > thawed. This poses a problem especially during memory pressure because
> > resources of the process are not released.
>
> Yes. But how does this differ from situation when the tracer is not
> frozen but just sleeps?
Appreciate the feedback, Oleg!
If the sleep is limited, I guess it's not that bad. With frozen
tracer, the tracee becomes unkillable potentially forever.
> Or it is traced too and its tracer is frozen?
Good question. I'm not an expert in ptracing, so TBH I don't know what
will happen. Will the tracer wake up, acknowledge the tracee and then
try to notify its frozen tracer or will the whole chain stall because
the ultimate parent is frozen?
>
> > Things become even more interesting if OOM killer picks such tracee
> > and adds it into oom_victims. oom_victims counter will get incremented
> > and stay that way until tracee exits. In the meantime, if the system
> > tries to go into suspend, it will call oom_killer_disable() after
> > freezing all processes.
>
> Confused... suspend doesn't use cgroup_freeze/etc, so it seems your
> patch should check frozen() rather than cgroup_task_frozen() ?
Ah, good point. In my particular case the tracer is frozen due to its
cgroup being frozen but frozen() would cover a more general case.
> And what if try_to_freeze_tasks() does freeze_task(tracee->parent)
> right after the check in ptrace_stop() below?
Uh, yeah. At this point we already sent the notification to the
tracer. Maybe we can ensure that the tracer acks all tracee
notifications before going into the freezer?
>
> I think it would better to simply change ptrace_stop() to check TIF_MEMDIE
> along with __fatal_signal_pending() and return in this case.
I think this would not fix the case we are experiencing. In our case
the tracee is killed from the userspace (TIF_MEMDIE is not set yet),
gets stuck in ptrace_stop() waiting for an ack from the tracer and
then is picked up by OOM-killer with the abovementioned consequences.
IOW, checking for TIF_MEMDIE in ptrace_stop() will not prevent tracee
from waiting for the ack from a frozen tracer.
>
> Although TIF_MEMDIE is per-thread... perhaps signal->oom_mm != NULL?
>
> Michal, what do you think?
>
> Of course, this won't fix all problems.
As I mentioned, I'm not an expert in ptrace, so I'll gladly try any
better solution if one is proposed.
Thanks,
Suren.
>
> Oleg.
>
> > That call will fail due to positive oom_victims,
> > but not until freeze_timeout_msecs passes. For the whole duration of the
> > freeze_timeout_msecs (20sec by default) the system will appear
> > unresponsive.
> > To fix this problem, skip the ack waiting step in the tracee when it's
> > exiting and the tracer is frozen. Per ptrace(2) manual, the tracer
> > cannot assume that the ptrace-stopped tracee exists. Therefore this
> > change does not break any valid assumptions.
> >
> > Debugged-by: Martin Liu <liumartin@google.com>
> > Debugged-by: Minchan Kim <minchan@google.com>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > kernel/signal.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index 1f9dd41c04be..dd9c18fdaaa5 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -2320,6 +2320,19 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
> > if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
> > do_notify_parent_cldstop(current, false, why);
> >
> > + /*
> > + * If tracer is frozen, it won't ack until it gets unfrozen and if the
> > + * tracee is exiting this means its resources do not get freed until
> > + * the tracer is thawed. Skip waiting for the tracer. Per ptrace(2)
> > + * manual, the tracer cannot assume that the ptrace-stopped tracee
> > + * exists, so exiting now should not be an issue.
> > + */
> > + if (current->ptrace && (exit_code >> 8) == PTRACE_EVENT_EXIT &&
> > + cgroup_task_frozen(current->parent)) {
> > + read_unlock(&tasklist_lock);
> > + goto skip_wait;
> > + }
> > +
> > /*
> > * The previous do_notify_parent_cldstop() invocation woke ptracer.
> > * One a PREEMPTION kernel this can result in preemption requirement
> > @@ -2356,6 +2369,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
> > schedule();
> > cgroup_leave_frozen(true);
> >
> > +skip_wait:
> > /*
> > * We are back. Now reacquire the siglock before touching
> > * last_siginfo, so that we are sure to have synchronized with
> >
> > base-commit: 6c0483dbfe7223f2b8390e3d5fe942629d3317b7
> > --
> > 2.45.2.803.g4e1b14247a-goog
> >
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] signal: on exit skip waiting for an ack from the tracer if it is frozen
2024-06-30 19:12 ` Suren Baghdasaryan
@ 2024-07-03 16:48 ` Oleg Nesterov
2024-07-03 18:23 ` Suren Baghdasaryan
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2024-07-03 16:48 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, mhocko, brauner, tandersen, bigeasy, vincent.whitchurch,
ardb, linux-kernel, Martin Liu, Minchan Kim
Suren, I am sorry for the late reply,
On 06/30, Suren Baghdasaryan wrote:
>
> > I think it would better to simply change ptrace_stop() to check TIF_MEMDIE
> > along with __fatal_signal_pending() and return in this case.
>
> I think this would not fix the case we are experiencing. In our case
> the tracee is killed from the userspace (TIF_MEMDIE is not set yet),
OK, I misunderstood the problem.
> gets stuck in ptrace_stop() waiting for an ack from the tracer and
> then is picked up by OOM-killer with the abovementioned consequences.
and __task_will_free_mem() returns true if SIGNAL_GROUP_EXIT is set...
Nevermind.
> > Of course, this won't fix all problems.
>
> As I mentioned, I'm not an expert in ptrace, so I'll gladly try any
> better solution if one is proposed.
I do not see any solution, sorry.
ptrace doesn't allow to intercept/nack SIGKILL, but at the same time it
happily allows the killed tracee to sleep in PTRACE_EVENT_EXIT. And even
another SIGKILL/whatever can't wake the tracee up.
This is historical behaviour, I do not see how can we change it. Any
change will break something.
Oleg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] signal: on exit skip waiting for an ack from the tracer if it is frozen
2024-07-03 16:48 ` Oleg Nesterov
@ 2024-07-03 18:23 ` Suren Baghdasaryan
0 siblings, 0 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2024-07-03 18:23 UTC (permalink / raw)
To: Oleg Nesterov
Cc: akpm, mhocko, brauner, tandersen, bigeasy, vincent.whitchurch,
ardb, linux-kernel, Martin Liu, Minchan Kim
On Wed, Jul 3, 2024 at 9:50 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Suren, I am sorry for the late reply,
>
> On 06/30, Suren Baghdasaryan wrote:
> >
> > > I think it would better to simply change ptrace_stop() to check TIF_MEMDIE
> > > along with __fatal_signal_pending() and return in this case.
> >
> > I think this would not fix the case we are experiencing. In our case
> > the tracee is killed from the userspace (TIF_MEMDIE is not set yet),
>
> OK, I misunderstood the problem.
>
> > gets stuck in ptrace_stop() waiting for an ack from the tracer and
> > then is picked up by OOM-killer with the abovementioned consequences.
>
> and __task_will_free_mem() returns true if SIGNAL_GROUP_EXIT is set...
> Nevermind.
>
> > > Of course, this won't fix all problems.
> >
> > As I mentioned, I'm not an expert in ptrace, so I'll gladly try any
> > better solution if one is proposed.
>
> I do not see any solution, sorry.
Ok, in any case, thanks for the feedback!
Do you think if I resolve the race you mentioned (what if
try_to_freeze_tasks() does freeze_task(tracee->parent) right after the
check in ptrace_stop()) and replace cgroup_task_frozen() with
frozen(), this solution would be acceptable?
Your question about a tracer being traced itself and its tracer being
frozen *I think* would be quite rare. I don't think it's a common
pattern to trace a process which in turn is tracing another one. Or am
I wrong?
Thanks,
Suren.
>
> ptrace doesn't allow to intercept/nack SIGKILL, but at the same time it
> happily allows the killed tracee to sleep in PTRACE_EVENT_EXIT. And even
> another SIGKILL/whatever can't wake the tracee up.
>
> This is historical behaviour, I do not see how can we change it. Any
> change will break something.
>
> Oleg.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-03 18:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 17:32 [PATCH 1/1] signal: on exit skip waiting for an ack from the tracer if it is frozen Suren Baghdasaryan
2024-06-29 13:12 ` Oleg Nesterov
2024-06-30 19:12 ` Suren Baghdasaryan
2024-07-03 16:48 ` Oleg Nesterov
2024-07-03 18:23 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox