From: Ilya Leoshkevich <iii@linux.ibm.com>
To: "Richard Henderson" <richard.henderson@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH 00/18] Stop all qemu-cpu threads on a breakpoint
Date: Thu, 10 Oct 2024 00:01:48 +0200 [thread overview]
Message-ID: <991185a8191694b489cf7d2374ffa99fc6a5ee82.camel@linux.ibm.com> (raw)
In-Reply-To: <94ebebf2-e775-4fd2-8fcf-921610261a7e@linaro.org>
On Tue, 2024-10-08 at 11:17 -0700, Richard Henderson wrote:
> On 10/5/24 13:35, Ilya Leoshkevich wrote:
> > > How can we handle the long-running syscalls?
> > > Just waiting sounds unsatisfying.
> > > Sending a reserved host signal may alter the guest's behaviour if
> > > a
> > > syscall like pause() is interrupted.
> > > What do you think about SIGSTOP-ping the "in_syscall" threads?
> > > A quick experiment shows that it should be completely invisible
> > > to
> > > the
> > > guest - the following program continues to run after
> > > SIGSTOP/SIGCONT:
> > >
> > > #include <sys/syscall.h>
> > > #include <unistd.h>
> > > int main(void) { syscall(__NR_pause); };
> >
> > Hmm, no, that won't work: SIGSTOP would stop all threads.
> >
> > So I wonder if reserving a host signal for interrupting
> > "in_syscall"
> > threads would be an acceptable tradeoff?
>
> Could work, yes. We already steal SIGRTMIN for guest abort (to
> distinguish from host
> abort), and remap guest __SIGRTMIN to host SIGRTMIN+1. Grabbing
> SIGRTMIN+1 should work
> ok, modulo the existing problem of presenting the guest with an
> incomplete set of signals.
>
> I've wondered from time to time about multiplexing signals in this
> space, but I think that
> runs afoul of having a consistent mapping for interprocess signaling.
>
>
> r~
I tried to think through how this would work in conjunction with
start_exclusive(), and there is one problem I don't see a good solution
for. Maybe you will have an idea.
The way I'm thinking of implementing this is as follows:
- Reserve the host's SIGRTMIN+1 and tweak host_signal_handler() to do
nothing for this signal.
- In gdb_try_stop(), call start_exclusive(). After it returns, some
threads will be parked in exclusive_idle(). Some other threads will
be on their way to getting parked, and this needs to actually happen
before gdb_try_stop() can proceed. For example, the ones that are
executing handle_pending_signal() may change memory and CPU state.
IIUC start_exclusive() will not wait for them, because they are not
"running". I think a global counter protected by qemu_cpu_list_lock
and paired with a new condition variable should be enough for this.
- Threads executing long-running syscalls will need to be interrupted
by SIGRTMIN+1. These syscalls will return -EINTR and will need
to be manually restarted so as not to disturb poorly written guests.
This needs to happen only if there are no pending guest signals.
- Here is a minor problem: how to identify threads which need to be
signalled? in_syscall may not be enough. But maybe signalling all
threads won't hurt too much. The parked ones won't notice anyway.
- But here is the major problem: what if we signal a thread just before
it starts executing a long-running syscall? Such thread will be stuck
and we'll need to signal it again. But how to determine that this
needs to be done?
An obvious solution is to signal all threads in a loop with a 0.1s
delay until the counter reaches n_threads. But it's quite ugly.
Ideally SIGRTMIN+1 should be blocked most of the time. Then we should
identify all places where long-running syscalls may be invoked and
unblock SIGRTMIN+1 atomically with executing them. But I'm not aware
of such mechanism (I have an extremely vague recollection that
someone managed to abuse rseq for this, but we shouldn't be relying
on rseq being available anyway).
next prev parent reply other threads:[~2024-10-09 22:02 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-23 16:12 [PATCH 00/18] Stop all qemu-cpu threads on a breakpoint Ilya Leoshkevich
2024-09-23 16:12 ` [PATCH 01/18] gdbstub: Make gdb_get_char() static Ilya Leoshkevich
2024-10-05 19:20 ` Richard Henderson
2024-09-23 16:12 ` [PATCH 02/18] gdbstub: Move phy_memory_mode to GDBSystemState Ilya Leoshkevich
2024-10-05 19:21 ` Richard Henderson
2024-09-23 16:12 ` [PATCH 03/18] gdbstub: Move gdb_syscall_mode to GDBSyscallState Ilya Leoshkevich
2024-10-05 19:22 ` Richard Henderson
2024-09-23 16:12 ` [PATCH 04/18] gdbstub: Factor out gdb_try_stop() Ilya Leoshkevich
2024-10-05 19:26 ` Richard Henderson
2024-09-23 16:13 ` [PATCH 05/18] accel/tcg: Factor out cpu_exec_user() Ilya Leoshkevich
2024-10-05 19:29 ` Richard Henderson
2024-09-23 16:13 ` [PATCH 06/18] qemu-thread: Introduce QEMU_MUTEX_INITIALIZER Ilya Leoshkevich
2024-10-05 19:30 ` Richard Henderson
2024-09-23 16:13 ` [PATCH 07/18] qemu-thread: Introduce QEMU_COND_INITIALIZER Ilya Leoshkevich
2024-10-05 19:30 ` Richard Henderson
2024-09-23 16:13 ` [PATCH 08/18] replay: Add replay_mutex_{lock, unlock}() stubs for qemu-user Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 09/18] qemu-timer: Provide qemu_clock_enable() stub " Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 10/18] cpu: Use BQL in qemu-user Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 11/18] accel/tcg: Unify user implementations of qemu_cpu_kick() Ilya Leoshkevich
2024-10-05 19:31 ` Richard Henderson
2024-09-23 16:13 ` [PATCH 12/18] cpu: Track CPUs executing syscalls Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 13/18] cpu: Implement cpu_thread_is_idle() for qemu-user Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 14/18] cpu: Introduce cpu_is_paused() Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 15/18] cpu: Set current_cpu early in qemu-user Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 16/18] cpu: Allow pausing and resuming CPUs " Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 17/18] gdbstub: Pause all CPUs before sending stop replies Ilya Leoshkevich
2024-09-23 16:13 ` [PATCH 18/18] tests/tcg: Stress test thread breakpoints Ilya Leoshkevich
2024-09-23 16:37 ` [PATCH 00/18] Stop all qemu-cpu threads on a breakpoint Ilya Leoshkevich
2024-09-24 11:46 ` Richard Henderson
2024-09-25 7:43 ` Ilya Leoshkevich
2024-10-05 19:51 ` Richard Henderson
2024-10-05 20:26 ` Ilya Leoshkevich
2024-10-05 20:35 ` Ilya Leoshkevich
2024-10-08 18:17 ` Richard Henderson
2024-10-09 22:01 ` Ilya Leoshkevich [this message]
2025-01-08 15:56 ` Alex Bennée
2025-01-08 16:20 ` Ilya Leoshkevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=991185a8191694b489cf7d2374ffa99fc6a5ee82.camel@linux.ibm.com \
--to=iii@linux.ibm.com \
--cc=alex.bennee@linaro.org \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).