All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hajime Tazaki <thehajime@gmail.com>
To: benjamin@sipsolutions.net
Cc: linux-um@lists.infradead.org, ricarkol@google.com,
	Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10 09/13] x86/um: nommu: signal handling
Date: Mon, 30 Jun 2025 10:04:00 +0900	[thread overview]
Message-ID: <m2plem3urj.wl-thehajime@gmail.com> (raw)
In-Reply-To: <734965ac85b2c4cf481cc98ac53052fd5064d30e.camel@sipsolutions.net>


Hello Benjamin,

On Sat, 28 Jun 2025 00:02:05 +0900,
Benjamin Berg wrote:
> 
> Hi,
> 
> On Fri, 2025-06-27 at 22:50 +0900, Hajime Tazaki wrote:
> > thanks for the comment on the complicated part of the kernel (signal).
> 
> This stuff isn't simple.
> 
> Actually, I am starting to think that the current MMU UML kernel also
> needs a redesign with regard to signal handling and stack use in that
> case. My current impression is that the design right now only permits
> voluntarily scheduling. More specifically, scheduling in response to an
> interrupt is impossible.
> 
> I suppose that works fine, but it also does not seem quite right.

thanks for the info.  it's very useful to understand what's going on.

(snip)

> > > > +void set_mc_userspace_relay_signal(mcontext_t *mc)
> > > > +{
> > > > + mc->gregs[REG_RIP] = (unsigned long) __userspace_relay_signal;
> > > > +}
> > > > +
> > 
> > This is a bit scary code which I tried to handle when SIGSEGV is
> > raised by host for a userspace program running on UML (nommu).
> > 
> > # and I should remember my XXX tag is important to fix....
> > 
> > let me try to explain what happens and what I tried to solve.
> > 
> > The SEGV signal from userspace program is delivered to userspace but
> > if we don't fix the code raising the signal, after (um) rt_sigreturn,
> > it will restart from $rip and raise SIGSEGV again.
> > 
> > # so, yes, we've already relied on host and um's rt_sigreturn to
> >   restore various things.
> > 
> > when a uml userspace crashes with SIGSEGV,
> > 
> > - host kernel raises SIGSEGV (at original $rip)
> > - caught by uml process (hard_handler)
> > - raise a signal to uml userspace process (segv_handler)
> > - handler ends (hard_handler)
> > - (host) run restorer (rt_sigreturn, registered by (libc)sigaction,
> >   not (host) rt_sigaction)
> > - return back to the original $rip
> > - (back to top)
> > 
> > this is the case where endless loop is happened.
> > um's sa_handler isn't called as rt_sigreturn (um) isn't called.
> > and the my original attempt (__userspace_relay_signal) is what I tried.
> > 
> > I agree that it is lazy to call a dummy syscall (indeed, getpid).
> > I'm trying to introduce another routine to jump into userspace and
> > call (um) rt_sigreturn after (host) rt_sigreturn.
> > 
> > > And this is really confusing me. The way I am reading it, the code
> > > tries to do:
> > >    1. Rewrite RIP to jump to __userspace_relay_signal
> > >    2. Trigger a getpid syscall (to do "nothing"?)
> > >    3. Let do_syscall_64 fire the signal from interrupt_end
> > 
> > correct.
> > 
> > > However, then that really confuses me, because:
> > >  * If I am reading it correctly, then this approach will destroy the
> > >    contents of various registers (RIP, RAX and likely more)
> > >  * This would result in an incorrect mcontext in the userspace signal
> > >    handler (which could be relevant if userspace is inspecting it)
> > >  * However, worst, rt_sigreturn will eventually jump back
> > >    into__userspace_relay_signal, which has nothing to return to.
> > >  * Also, relay_signal doesn't use this? What happens for a SIGFPE, how
> > >    is userspace interrupted immediately in that case?
> > 
> > relay_signal shares the same goal of this, indeed.
> > but the issue with `mc->gregs[REG_RIP]` (endless signals) still exists
> > I guess.
> 
> Well, endless signals only exist as long as you exit to the same
> location. My suggestion was to read the user state from the mcontext
> (as SECCOMP mode does it) and executing the signal right away, i.e.:

thanks too;  below is my understanding.

>  * Fetch the current registers from the mcontext

I guess this is already done in sig_handler_common().

>  * Push the signal context onto the userspace stack

(guess) this is already done on handle_signal() => setup_signal_stack_si().

>  * Modify the host mcontext to set registers for the signal handler

this is something which I'm not well understanding.
- do you mean the host handler when you say "for the signal handler" ?
  or the userspace handler ?
- if former (the host one), maybe mcontext is already there so, it
  might not be the one you mentioned.
- if the latter, how the original handler (the host one,
  hard_handler()) works ? even if we can call userspace handler
  instead of the host one, we need to call the host handler (and
  restorer).  do we call both ?
- and by "to set registers", what register do you mean ? for the
  registers inspected by userspace signal handler ?  but if you set a
  register, for instance RIP, as the fault location to the host
  register, it will return to RIP after handler and restart the fault
  again ?

>  * Jump back to userspace by doing a "return"

this is still also unclear to me.

it would be very helpful if you point the location of the code (at
uml/next tree) on how SECCOMP mode does.  I'm also looking at but
really hard to map what you described and the code (sorry).

all of above runs within hard_handler() in nommu mode on SIGSEGV.
my best guess is this is different from what ptrace/seccomp do.

> Said differently, I really prefer deferring as much logic as possible
> to the host. This is both safer and easier to understand. Plus, it also
> has the advantage of making it simpler to port UML to other
> architectures.

okay.

> 
> > > Honestly, I really think we should take a step back and swap the
> > > current syscall entry/exit code. That would likely also simplify
> > > floating point register handling, which I think is currently
> > > insufficient do deal with the odd special cases caused by different
> > > x86_64 hardware extensions.
> > > 
> > > Basically, I think nommu mode should use the same general approach as
> > > the current SECCOMP mode. Which is to use rt_sigreturn to jump into
> > > userspace and let the host kernel deal with the ugly details of how to
> > > do that.
> > 
> > I looked at how MMU mode (ptrace/seccomp) does handle this case.
> > 
> > In nommu mode, we don't have external process to catch signals so, the
> > nommu mode uses hard_handler() to catch SEGV/FPE of userspace
> > programs.  While mmu mode calls segv_handler not in a context of
> > signal handler.
> > 
> > # correct me if I'm wrong.
> > 
> > thus, mmu mode doesn't have this situation.
> 
> Yes, it does not have this specific issue. But see the top of the mail
> for other issues that are somewhat related.
> 
> > I'm attempting various ways; calling um's rt_sigreturn instead of
> > host's one, which doesn't work as host restore procedures (unblocking
> > masked signals, restoring register states, etc) aren't called.
> > 
> > I'll update here if I found a good direction, but would be great if
> > you see how it should be handled.
> 
> Can we please discuss possible solutions? We can figure out the details
> once it is clear how the interaction with the host should work.

I was wishing to update to you that I'm working on it.  So, your
comments are always helpful to me.  Thanks.

-- Hajime

> I still think that the idea of using the kernel task stack as the
> signal stack is really elegant. Actually, doing that in normal UML may
> be how we can fix the issues mentioned at the top of my mail. And for
> nommu, we can also use the host mcontext to jump back into userspace
> using a simple "return".
> 
> Conceptually it seems so simple.
> 
> Benjamin
> 
> 
> > 
> > -- Hajime
> > 
> > > I believe that this requires a second "userspace" sigaltstack in
> > > addition to the current "IRQ" sigaltstack. Then switching in between
> > > the two (note that the "userspace" one is also used for IRQs if those
> > > happen while userspace is executing).
> > > 
> > > So, in principle I would think something like:
> > >  * to jump into userspace, you would:
> > >     - block all signals
> > >     - set "userspace" sigaltstack
> > >     - setup mcontext for rt_sigreturn
> > >     - setup RSP for rt_sigreturn
> > >     - call rt_sigreturn syscall
> > >  * all signal handlers can (except pure IRQs):
> > >     - check on which stack they are
> > >       -> easy to detect whether we are in kernel mode
> > >     - for IRQs one can probably handle them directly (and return)
> > >     - in user mode:
> > >        + store mcontext location and information needed for rt_sigreturn
> > >        + jump back into kernel task stack
> > >  * kernel task handler to continue would:
> > >     - set sigaltstack to IRQ stack
> > >     - fetch register from mcontext
> > >     - unblock all signals
> > >     - handle syscall/signal in whatever way needed
> > > 
> > > Now that I wrote about it, I am thinking that it might be possible to
> > > just use the kernel task stack for the signal stack. One would probably
> > > need to increase the kernel stack size a bit, but it would also mean
> > > that no special code is needed for "rt_sigreturn" handling. The rest
> > > would remain the same.
> > > 
> > > Thoughts?
> > > 
> > > Benjamin
> > > 
> > > > [SNIP]
> > > 
> > 
> 


  reply	other threads:[~2025-06-30  1:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-22 21:32 [PATCH v10 00/13] nommu UML Hajime Tazaki
2025-06-22 21:32 ` [PATCH v10 01/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 02/13] um: decouple MMU specific code from the common part Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 03/13] um: nommu: memory handling Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 04/13] x86/um: nommu: syscall handling Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 05/13] um: nommu: seccomp syscalls hook Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 06/13] x86/um: nommu: process/thread handling Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 07/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 08/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 09/13] x86/um: nommu: signal handling Hajime Tazaki
2025-06-24 23:20   ` Benjamin Berg
2025-06-27 13:50     ` Hajime Tazaki
2025-06-27 15:02       ` Benjamin Berg
2025-06-30  1:04         ` Hajime Tazaki [this message]
2025-07-01 12:03           ` Benjamin Berg
2025-07-02  4:37             ` Hajime Tazaki
2025-07-10 23:59               ` Hajime Tazaki
2025-07-11  9:39                 ` Benjamin Berg
2025-07-11 10:05                   ` Benjamin Berg
2025-07-12  1:16                     ` Hajime Tazaki
2025-07-12  7:58                       ` Benjamin Berg
2025-06-22 21:33 ` [PATCH v10 10/13] um: nommu: a work around for MMU dependency to PCI driver Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 11/13] um: change machine name for uname output Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
2025-06-22 21:33 ` [PATCH v10 13/13] um: nommu: plug nommu code into build system Hajime Tazaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2plem3urj.wl-thehajime@gmail.com \
    --to=thehajime@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=benjamin@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=ricarkol@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.