From: Oleg Nesterov <oleg@redhat.com>
To: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>,
Tejun Heo <tj@kernel.org>, Roland McGrath <roland@redhat.com>,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
akpm@linux-foundation.org
Subject: Re: [PATCH 1/1] ptrace: make sure do_wait() won't hang after PTRACE_ATTACH
Date: Sat, 19 Feb 2011 21:06:37 +0100 [thread overview]
Message-ID: <20110219200637.GA8662@redhat.com> (raw)
In-Reply-To: <20110218213429.GB2066@host1.dyn.jankratochvil.net>
On 02/18, Jan Kratochvil wrote:
>
> On Thu, 17 Feb 2011 17:49:06 +0100, Oleg Nesterov wrote:
> > > - that is to leave the process in
> > > `T (stopped)' without any single PC step.
> >
> > This is not exactly clear to me... I mean "without any single PC step".
> > Why?
>
> Engineers investigating problems of applications SIGSTOP it when it is in the
> critical situation. Then they run gcore, gstack etc. After they are
> satisfied with the analsysis they send SIGCONT.
>
> If the application being investigated changes state between the various tools
> it may be confusing as the dumps will not match. Ale in some cases some
> critical state being investigated may get lost.
Which state can be changed?
Of course, the tracee shouldn't return to the user-space before the
stop, it shouldn't change its registers or anything which can be
noticed by gcore/gstack/etc. But why it can't do any single PC step
in kernel?
> > > A new proposal is to preserve the process's `T (stopped)' for
> > > a naive/legacy debugger / ptrace tool doing PTRACE_ATTACH, wait->SIGSTOP,
> > > PTRACE_DETACH(0), incl. GDB doing the "GDB trick" above.
> > > That is after PTRACE_DETACH(0) the process should remain `T (stopped)'
> > > iff the process was `T (stopped)' before PTRACE_ATTACH.
> > > - PTRACE_DETACH(0) should preserve `T (stopped)'.
> >
> > Hmm. OK, but I assume you meant "unless the tracee was resumed in between".
>
> You described the exact behavior of current Fedora/RHEL gdb. But in general
> I do not insist on it, one can for example run an inferior function call
> during the investigation-under-SIGSTOP described above, even in such case one
> still wants to detach the application still in the `T (stopped)' mode.
>
> Detaching process as '(T) stopped' is not such a problem as the app/user can
> send SIGCONT to it. But accidentally unstopping the process during detach
> cannot be fixed/workarounded.
Now I am confused. I do not really understand what do you want, but
I feel that what you are trying to suggest is not very right (or I
misunderstood you).
But, once again, I think this should be discussed in another thread.
Yes, we have multiple "it-doesnt-stop-after-detach" problems, but we
can't discuss all problems in this thread.
> > But. Let me remind. PTRACE_DETACH(SIGXXX) does not always work as
> > gdb thinks, SIGXXX can be ignored.
>
> In such case it is a bug.
Well, may be this is bug, may be not. This is fact ;) Perhaps we should
change this but, again, we need another thread for discussion.
And, btw, we already discussed this. I explained many times that DETACH/CONT
do not always respect SIGXXX argument, you never said this should be fixed.
In particular, I already mentioned in this thread that this argument has no
effect after the jctl stop. I guess this only proves we should discuss this
separately to avoid the confusion ;)
> Due to this bug there is probably the
> tgkill(SIGSTOP)+PTRACE_DETACH(0) used by the "detach-stopped-rhel5"
> ptrace-testsuite testfile, IIUC.
No. I already tried to explain the problems with detach-stopped in my
previous email:
> This works in some kernels and
> does not work in other kernels,
Afaics, this only works in utrace-based kernels.
In upstream kernel, we have the extra wake_up_state() in ptrace_detach().
And,
> it is "detach-stopped" test in:
But there is another problem which can't be really tested by detach-stopped
(because it detaches when the tracee was already stopped). The
SIGNAL_STOP_DEQUEUED logic is not correct.
In short: the wrong wakeup + the broken SIGNAL_STOP_DEQUEUED logic.
detach-stopped-rhel5 does tkill(SIGSTOP), this fixes the latter. But
it can fail anyway afaics, just the probability is low.
> A process is not in the `T (stopped)' state randomly. AFAIK it is there due
> to an engineer sending it SIGSTOP. Applications themselves do not use SIGSTOP
> themselves to get into `T (stopped)' during their execution.
Well, they do ;) but I think this doesn't matter.
> And if the engineer sent SIGSTOP it was intentional. The engineer does not
> want some tool to accidentally cancel his intentional SIGSTOP. When the
> engineer decides so (s)he can send SIGCONT appropriately.
>
> SIGSTOP I find as a hard stop and thus even the tracers/debuggers of
> the `T (stopped)' process should just get no response from it.
If I understand you correctly, then I agree very much here, and this was
our point.
But I am afraid I could misunderstand, please see below.
> I do not think
> ptrace is a good tool for some general system monitoring - to see any
> SIGCONT/SIGSTOP deliveries - because ptrace is (a) single-master limited
> (second PTRACE_ATTACH gets EPERM)
This is what I certainly can't understand,
> and (b) ptrace-control is not transparent
> due to the threads/races timing (on `t (tracing stop)').
We are going to try to fix this races,
> Therefore if the debugger sends some SIGSTOP/SIGCONT those should be rather
> ignored for compatibility reasons
Well, I don't think so. In particular they shouldn't be ignored for
compatibility reasons.
Jan. Could you please explicitly answer our question? We have the numerious
problems with jctl and ptrace. Everything is just broken. And it is broken
by design, that is why it is not easy to fix the code: we should first
discuss what do we want to get in result. Please forget about attach/detach
for the moment. I'll repeat my question:
Suppose that the tracee is 'T (stopped)'. Because the debugger did
PTRACE_CONT(SIGSTOP), or because debugger attached to the stopped task.
Currently, PTRACE_CONT(WHATEVER) after that always resumes the tracee,
despite the fact it is still stopped in some sense. This leads to
numerous oddities/bugs.
What we propose is to change this so that the tracee does not run
until it actually recieves SIGCONT.
Is it OK for gdb or not?
IOW. To simplify. Suppose we have a task in 'T (stopped)' state. Then
debugger comes and does
ptrace(PTRACE_ATTACH);
PTRACE(PTRACE_CONT, 0);
With the current code the tracee runs after that. We want to change
the kernel so that the tracee won't run, but becomes 'T (stopped)'
again. It only runs when it gets SIGCONT.
Do you agree with such a change?
And yes, yes,
ptrace(PTRACE_ATTACH);
ptrace(PTRACE_DETACH, 0)
should leave it stopped too, of course.
Oleg.
next prev parent reply other threads:[~2011-02-19 20:15 UTC|newest]
Thread overview: 160+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-28 15:08 [PATCHSET] ptrace,signal: group stop / ptrace updates Tejun Heo
2011-01-28 15:08 ` [PATCH 01/10] signal: fix SIGCONT notification code Tejun Heo
2011-01-28 15:08 ` [PATCH 02/10] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
2011-01-28 18:46 ` Roland McGrath
2011-01-31 10:38 ` Tejun Heo
2011-02-01 10:26 ` [PATCH] ptrace: use safer wake up on ptrace_detach() Tejun Heo
2011-02-01 13:40 ` Oleg Nesterov
2011-02-01 15:07 ` Tejun Heo
2011-02-01 19:17 ` Oleg Nesterov
2011-02-02 5:31 ` Roland McGrath
2011-02-02 10:35 ` Tejun Heo
2011-02-02 0:27 ` Andrew Morton
2011-02-02 5:33 ` Roland McGrath
2011-02-02 5:38 ` Andrew Morton
2011-02-02 10:34 ` Tejun Heo
2011-02-02 19:33 ` Andrew Morton
2011-02-02 20:01 ` Tejun Heo
2011-02-02 21:40 ` Oleg Nesterov
2011-02-02 5:29 ` Roland McGrath
2011-02-02 5:28 ` [PATCH 02/10] ptrace: remove the extra wake_up_process() from ptrace_detach() Roland McGrath
2011-01-28 15:08 ` [PATCH 03/10] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
2011-01-28 18:46 ` Roland McGrath
2011-01-28 15:08 ` [PATCH 04/10] ptrace: kill tracehook_notify_jctl() Tejun Heo
2011-01-28 21:09 ` Roland McGrath
2011-01-28 15:08 ` [PATCH 05/10] ptrace: add @why to ptrace_stop() Tejun Heo
2011-01-28 18:48 ` Roland McGrath
2011-01-28 15:08 ` [PATCH 06/10] signal: fix premature completion of group stop when interfered by ptrace Tejun Heo
2011-01-28 21:22 ` Roland McGrath
2011-01-31 11:00 ` Tejun Heo
2011-02-02 5:44 ` Roland McGrath
2011-02-02 10:56 ` Tejun Heo
2011-01-28 15:08 ` [PATCH 07/10] signal: use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
2011-01-28 15:08 ` [PATCH 08/10] ptrace: participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
2011-01-28 21:30 ` Roland McGrath
2011-01-31 11:26 ` Tejun Heo
2011-02-02 5:57 ` Roland McGrath
2011-02-02 10:53 ` Tejun Heo
2011-02-03 10:02 ` Tejun Heo
2011-02-01 19:36 ` Oleg Nesterov
2011-01-28 15:08 ` [PATCH 09/10] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
2011-01-28 15:08 ` [PATCH 10/10] ptrace: clean transitions between TASK_STOPPED and TRACED Tejun Heo
2011-02-03 20:41 ` [PATCH 0/1] (Was: ptrace: clean transitions between TASK_STOPPED and TRACED) Oleg Nesterov
2011-02-03 20:41 ` [PATCH 1/1] ptrace: make sure do_wait() won't hang after PTRACE_ATTACH Oleg Nesterov
2011-02-03 21:36 ` Roland McGrath
2011-02-03 21:44 ` Oleg Nesterov
2011-02-04 10:53 ` Tejun Heo
2011-02-04 13:04 ` Oleg Nesterov
2011-02-04 14:48 ` Tejun Heo
2011-02-04 17:06 ` Oleg Nesterov
2011-02-05 13:39 ` Tejun Heo
2011-02-07 13:42 ` Oleg Nesterov
2011-02-07 14:11 ` Tejun Heo
2011-02-07 15:37 ` Oleg Nesterov
2011-02-07 16:31 ` Tejun Heo
2011-02-07 17:48 ` Oleg Nesterov
2011-02-09 14:18 ` Tejun Heo
2011-02-09 14:21 ` Tejun Heo
2011-02-09 21:25 ` Oleg Nesterov
2011-02-13 23:01 ` Denys Vlasenko
2011-02-14 9:03 ` Jan Kratochvil
2011-02-14 11:39 ` Denys Vlasenko
2011-02-14 17:32 ` Oleg Nesterov
2011-02-14 16:01 ` Oleg Nesterov
2011-02-26 3:59 ` Pavel Machek
2011-02-14 15:51 ` Oleg Nesterov
2011-02-14 14:50 ` Tejun Heo
2011-02-14 18:53 ` Oleg Nesterov
2011-02-13 22:25 ` Denys Vlasenko
2011-02-14 15:13 ` Tejun Heo
2011-02-14 16:15 ` Oleg Nesterov
2011-02-14 16:33 ` Tejun Heo
2011-02-14 17:23 ` Oleg Nesterov
2011-02-14 17:20 ` Denys Vlasenko
2011-02-14 17:30 ` Tejun Heo
2011-02-14 17:45 ` Oleg Nesterov
2011-02-14 17:54 ` Denys Vlasenko
2011-02-21 15:16 ` Tejun Heo
2011-02-21 15:28 ` Oleg Nesterov
2011-02-21 16:11 ` [pseudo patch] ptrace should respect the group stop Oleg Nesterov
2011-02-22 16:24 ` [PATCH 1/1] ptrace: make sure do_wait() won't hang after PTRACE_ATTACH Tejun Heo
2011-02-24 21:08 ` Oleg Nesterov
2011-02-25 15:45 ` Tejun Heo
2011-02-25 17:42 ` Roland McGrath
2011-02-28 15:23 ` Oleg Nesterov
2011-02-14 17:51 ` Oleg Nesterov
2011-02-14 18:55 ` Denys Vlasenko
2011-02-14 19:01 ` Oleg Nesterov
2011-02-14 19:42 ` Denys Vlasenko
2011-02-14 20:01 ` Oleg Nesterov
2011-02-15 15:24 ` Tejun Heo
2011-02-15 15:58 ` Oleg Nesterov
2011-02-15 17:31 ` Roland McGrath
2011-02-15 20:27 ` Oleg Nesterov
2011-02-18 17:02 ` Tejun Heo
2011-02-18 19:37 ` Oleg Nesterov
2011-02-21 16:22 ` Tejun Heo
2011-02-21 16:49 ` Oleg Nesterov
2011-02-21 16:59 ` Tejun Heo
2011-02-23 19:31 ` Oleg Nesterov
2011-02-25 15:10 ` Tejun Heo
2011-02-24 20:29 ` Oleg Nesterov
2011-02-25 15:51 ` Tejun Heo
2011-02-26 2:48 ` Denys Vlasenko
2011-02-28 12:56 ` Tejun Heo
2011-02-28 13:16 ` Denys Vlasenko
2011-02-28 13:29 ` Tejun Heo
2011-02-28 13:41 ` Denys Vlasenko
2011-02-28 13:53 ` Tejun Heo
2011-02-28 14:25 ` Denys Vlasenko
2011-02-28 14:39 ` Tejun Heo
2011-02-28 16:48 ` Oleg Nesterov
2011-02-28 14:36 ` Oleg Nesterov
2011-02-16 21:51 ` Jan Kratochvil
2011-02-17 3:37 ` Denys Vlasenko
2011-02-17 19:19 ` Oleg Nesterov
2011-02-18 21:11 ` Jan Kratochvil
2011-02-19 20:16 ` Oleg Nesterov
2011-02-17 16:49 ` Oleg Nesterov
2011-02-17 18:58 ` Roland McGrath
2011-02-17 19:33 ` Oleg Nesterov
2011-02-18 21:34 ` Jan Kratochvil
2011-02-19 20:06 ` Oleg Nesterov [this message]
2011-02-20 9:40 ` Jan Kratochvil
2011-02-20 17:06 ` Denys Vlasenko
2011-02-20 17:48 ` Oleg Nesterov
2011-02-20 19:10 ` Jan Kratochvil
2011-02-20 19:16 ` Oleg Nesterov
2011-02-20 17:16 ` Oleg Nesterov
2011-02-20 18:52 ` Jan Kratochvil
2011-02-20 20:38 ` Oleg Nesterov
2011-02-20 21:06 ` `(T) stopped' preservation after _exit() [Re: [PATCH 1/1] ptrace: make sure do_wait() won't hang after PTRACE_ATTACH] Jan Kratochvil
2011-02-20 21:19 ` Oleg Nesterov
2011-02-20 21:20 ` [PATCH 1/1] ptrace: make sure do_wait() won't hang after PTRACE_ATTACH Jan Kratochvil
2011-02-21 14:23 ` Oleg Nesterov
2011-02-23 16:44 ` Jan Kratochvil
2011-02-14 15:31 ` Oleg Nesterov
2011-02-14 17:24 ` Denys Vlasenko
2011-02-14 17:39 ` Oleg Nesterov
2011-02-14 17:57 ` Denys Vlasenko
2011-02-14 18:00 ` Oleg Nesterov
2011-02-14 18:06 ` Oleg Nesterov
2011-02-14 18:59 ` Denys Vlasenko
2011-02-13 21:24 ` Denys Vlasenko
2011-02-14 15:06 ` Oleg Nesterov
2011-02-14 15:19 ` Tejun Heo
2011-02-14 16:20 ` Oleg Nesterov
2011-02-14 17:05 ` Denys Vlasenko
2011-02-14 17:18 ` Oleg Nesterov
2011-01-28 16:54 ` [PATCHSET] ptrace,signal: group stop / ptrace updates Ingo Molnar
2011-01-28 17:41 ` Thomas Gleixner
2011-01-28 18:04 ` Anca Emanuel
2011-01-28 18:36 ` Mathieu Desnoyers
2011-01-28 17:55 ` Oleg Nesterov
2011-01-28 18:29 ` Bash not reacting to Ctrl-C Ingo Molnar
2011-02-05 20:34 ` Oleg Nesterov
2011-02-07 13:08 ` Oleg Nesterov
2011-02-09 6:17 ` Michael Witten
2011-02-09 14:53 ` Ingo Molnar
2011-02-09 19:37 ` Michael Witten
2011-02-11 14:41 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110219200637.GA8662@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=jan.kratochvil@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=roland@redhat.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vda.linux@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox