All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hajime Tazaki <thehajime@gmail.com>
To: benjamin@sipsolutions.net
Cc: linux-um@lists.infradead.org, ricarkol@google.com,
	Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10 09/13] x86/um: nommu: signal handling
Date: Fri, 27 Jun 2025 22:50:41 +0900	[thread overview]
Message-ID: <m2sejl47ke.wl-thehajime@gmail.com> (raw)
In-Reply-To: <3b407ed711c5d7e1819da7513c3e320699473b2d.camel@sipsolutions.net>


Hello,

thanks for the comment on the complicated part of the kernel (signal).

On Wed, 25 Jun 2025 08:20:03 +0900,
Benjamin Berg wrote:
> 
> Hi,
> 
> On Mon, 2025-06-23 at 06:33 +0900, Hajime Tazaki wrote:
> > This commit updates the behavior of signal handling under !MMU
> > environment. It adds the alignment code for signal frame as the frame
> > is used in userspace as-is.
> > 
> > floating point register is carefully handling upon entry/leave of
> > syscall routine so that signal handlers can read/write the contents of
> > the register.
> > 
> > It also adds the follow up routine for SIGSEGV as a signal delivery runs
> > in the same stack frame while we have to avoid endless SIGSEGV.
> > 
> > Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
> > ---
> >  arch/um/include/shared/kern_util.h    |   4 +
> >  arch/um/nommu/Makefile                |   2 +-
> >  arch/um/nommu/os-Linux/signal.c       |  13 ++
> >  arch/um/nommu/trap.c                  | 194 ++++++++++++++++++++++++++
> >  arch/x86/um/nommu/do_syscall_64.c     |   6 +
> >  arch/x86/um/nommu/os-Linux/mcontext.c |  11 ++
> >  arch/x86/um/shared/sysdep/mcontext.h  |   1 +
> >  arch/x86/um/shared/sysdep/ptrace.h    |   2 +-
> >  8 files changed, 231 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/um/nommu/trap.c
> > 
> > [SNIP]
> > diff --git a/arch/x86/um/nommu/os-Linux/mcontext.c b/arch/x86/um/nommu/os-Linux/mcontext.c
> > index c4ef877d5ea0..955e7d9f4765 100644
> > --- a/arch/x86/um/nommu/os-Linux/mcontext.c
> > +++ b/arch/x86/um/nommu/os-Linux/mcontext.c
> > @@ -6,6 +6,17 @@
> >  #include <sysdep/mcontext.h>
> >  #include <sysdep/syscalls.h>
> >  
> > +static void __userspace_relay_signal(void)
> > +{
> > + /* XXX: dummy syscall */
> > + __asm__ volatile("call *%0" : : "r"(__kernel_vsyscall), "a"(39) :);
> > +}
> 
> 39 is NR__getpid, I assume?
> 
> The "call *%0" looks like it is code for retpolin, I think this would
> currently just segfault.

# if you mean retpolin as zpoline,

zploine uses `call *%rax` so, this is not about zpoline.

> > +void set_mc_userspace_relay_signal(mcontext_t *mc)
> > +{
> > + mc->gregs[REG_RIP] = (unsigned long) __userspace_relay_signal;
> > +}
> > +

This is a bit scary code which I tried to handle when SIGSEGV is
raised by host for a userspace program running on UML (nommu).

# and I should remember my XXX tag is important to fix....

let me try to explain what happens and what I tried to solve.

The SEGV signal from userspace program is delivered to userspace but
if we don't fix the code raising the signal, after (um) rt_sigreturn,
it will restart from $rip and raise SIGSEGV again.

# so, yes, we've already relied on host and um's rt_sigreturn to
  restore various things.

when a uml userspace crashes with SIGSEGV,

- host kernel raises SIGSEGV (at original $rip)
- caught by uml process (hard_handler)
- raise a signal to uml userspace process (segv_handler)
- handler ends (hard_handler)
- (host) run restorer (rt_sigreturn, registered by (libc)sigaction,
  not (host) rt_sigaction)
- return back to the original $rip
- (back to top)

this is the case where endless loop is happened.
um's sa_handler isn't called as rt_sigreturn (um) isn't called.
and the my original attempt (__userspace_relay_signal) is what I tried.

I agree that it is lazy to call a dummy syscall (indeed, getpid).
I'm trying to introduce another routine to jump into userspace and
call (um) rt_sigreturn after (host) rt_sigreturn.

> And this is really confusing me. The way I am reading it, the code
> tries to do:
>    1. Rewrite RIP to jump to __userspace_relay_signal
>    2. Trigger a getpid syscall (to do "nothing"?)
>    3. Let do_syscall_64 fire the signal from interrupt_end

correct.

> However, then that really confuses me, because:
>  * If I am reading it correctly, then this approach will destroy the
>    contents of various registers (RIP, RAX and likely more)
>  * This would result in an incorrect mcontext in the userspace signal
>    handler (which could be relevant if userspace is inspecting it)
>  * However, worst, rt_sigreturn will eventually jump back
>    into__userspace_relay_signal, which has nothing to return to.
>  * Also, relay_signal doesn't use this? What happens for a SIGFPE, how
>    is userspace interrupted immediately in that case?

relay_signal shares the same goal of this, indeed.
but the issue with `mc->gregs[REG_RIP]` (endless signals) still exists
I guess.

> Honestly, I really think we should take a step back and swap the
> current syscall entry/exit code. That would likely also simplify
> floating point register handling, which I think is currently
> insufficient do deal with the odd special cases caused by different
> x86_64 hardware extensions.
> 
> Basically, I think nommu mode should use the same general approach as
> the current SECCOMP mode. Which is to use rt_sigreturn to jump into
> userspace and let the host kernel deal with the ugly details of how to
> do that.

I looked at how MMU mode (ptrace/seccomp) does handle this case.

In nommu mode, we don't have external process to catch signals so, the
nommu mode uses hard_handler() to catch SEGV/FPE of userspace
programs.  While mmu mode calls segv_handler not in a context of
signal handler.

# correct me if I'm wrong.

thus, mmu mode doesn't have this situation.


I'm attempting various ways; calling um's rt_sigreturn instead of
host's one, which doesn't work as host restore procedures (unblocking
masked signals, restoring register states, etc) aren't called.

I'll update here if I found a good direction, but would be great if
you see how it should be handled.

-- Hajime

> I believe that this requires a second "userspace" sigaltstack in
> addition to the current "IRQ" sigaltstack. Then switching in between
> the two (note that the "userspace" one is also used for IRQs if those
> happen while userspace is executing).
> 
> So, in principle I would think something like:
>  * to jump into userspace, you would:
>     - block all signals
>     - set "userspace" sigaltstack
>     - setup mcontext for rt_sigreturn
>     - setup RSP for rt_sigreturn
>     - call rt_sigreturn syscall
>  * all signal handlers can (except pure IRQs):
>     - check on which stack they are
>       -> easy to detect whether we are in kernel mode
>     - for IRQs one can probably handle them directly (and return)
>     - in user mode:
>        + store mcontext location and information needed for rt_sigreturn
>        + jump back into kernel task stack
>  * kernel task handler to continue would:
>     - set sigaltstack to IRQ stack
>     - fetch register from mcontext
>     - unblock all signals
>     - handle syscall/signal in whatever way needed
> 
> Now that I wrote about it, I am thinking that it might be possible to
> just use the kernel task stack for the signal stack. One would probably
> need to increase the kernel stack size a bit, but it would also mean
> that no special code is needed for "rt_sigreturn" handling. The rest
> would remain the same.
> 
> Thoughts?
> 
> Benjamin
> 
> > [SNIP]
> 


  reply	other threads:[~2025-06-27 14:18 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-22 21:32 [PATCH v10 00/13] nommu UML Hajime Tazaki
2025-06-22 21:32 ` [PATCH v10 01/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 02/13] um: decouple MMU specific code from the common part Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 03/13] um: nommu: memory handling Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 04/13] x86/um: nommu: syscall handling Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 05/13] um: nommu: seccomp syscalls hook Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 06/13] x86/um: nommu: process/thread handling Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 07/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 08/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 09/13] x86/um: nommu: signal handling Hajime Tazaki
2025-06-24 23:20   ` Benjamin Berg
2025-06-27 13:50     ` Hajime Tazaki [this message]
2025-06-27 15:02       ` Benjamin Berg
2025-06-30  1:04         ` Hajime Tazaki
2025-07-01 12:03           ` Benjamin Berg
2025-07-02  4:37             ` Hajime Tazaki
2025-07-10 23:59               ` Hajime Tazaki
2025-07-11  9:39                 ` Benjamin Berg
2025-07-11 10:05                   ` Benjamin Berg
2025-07-12  1:16                     ` Hajime Tazaki
2025-07-12  7:58                       ` Benjamin Berg
2025-06-22 21:33 ` [PATCH v10 10/13] um: nommu: a work around for MMU dependency to PCI driver Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 11/13] um: change machine name for uname output Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 13/13] um: nommu: plug nommu code into build system Hajime Tazaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2sejl47ke.wl-thehajime@gmail.com \
    --to=thehajime@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=benjamin@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=ricarkol@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.