All of lore.kernel.org
 help / color / mirror / Atom feed
* why is sigreturn() skipping some code on the way out of kernel?
@ 2012-05-01 23:37 Al Viro
  2012-05-02  4:11 ` David Mosberger-Tang
  2012-05-02  5:46 ` Al Viro
  0 siblings, 2 replies; 3+ messages in thread
From: Al Viro @ 2012-05-01 23:37 UTC (permalink / raw)
  To: linux-ia64

AFAICS, pNonSys had been used for several purposes from the very
beginning; one of them is suppression of syscall restarts and
since 2.3.99pre6-4 rt_sigreturn(2) is relying on that to prevent
a really unpleasant pile of problems.  So far, so good, but that's
not the only thing conditional on pNonSys - there's also
(pNonSys) br.cond.dpnt dont_preserve_current_frame
        COVER                           // add current frame into dirty partition and set cr.ifs
        ;;
        mov r19=ar.bsp                  // get new backing store pointer
rbs_switch:
        sub r16=r16,r18                 // krbs = old bsp - size of dirty partition
        cmp.ne p9,p0=r0,r0              // clear p9 to skip restore of cr.ifs
        ;;
        sub r19=r19,r16                 // calculate total byte size of dirty partition
        add r18d,r18                  // don't force in0-in7 into memory...
        ;;
        shl r19=r19,16                  // shift size of dirty partition into loadrs position
        ;;
dont_preserve_current_frame:

in .work_processed_kernel and I'd really like to know if skipping it on
sigreturn() is an intended effect.  I don't know ia64 well enough to even
tell what that code is doing, let alone whether it should be done on
sigreturn() or not...

The reason I'm asking is that we really need to loop after hitting
notify_resume_user() when we are returning to userland; not to mention
anything else, NEED_RESCHED might've been set while we had been building
sigframes.  Moreover, handling all pending signals is the normal behaviour
on most of the architectures; some of the reasons for it are covered by
kinda-sorta looping inside do_signal() (to catch self-inflicted SIGSEGV),
but the full-blown variant is saner anyway and it's better to have that
logics consistent between the architectures.

Of course, taking that loop into entry.S needs a different mechanism
for preventing double restarts; I wonder if using the same method that
is used by sigreturn() would be safe.  FWIW, entry.S bits are very
small; the question is whether they are correct...

Comments?

diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
index 1ccbe12..27322e6 100644
--- a/arch/ia64/kernel/entry.S
+++ b/arch/ia64/kernel/entry.S
@@ -1184,7 +1184,8 @@ skip_rbs_switch:
 
 .notify:
 (pUStk)	br.call.spnt.many rp=notify_resume_user
-.ret10:	cmp.ne p6,p0=r0,r0	// p6 <- 0 (don't re-check)
+.ret10:
+(pUStk)	cmp.eq.unc p6,p0=r0,r0		// p6 <- pUStk
 (pLvSys)br.cond.sptk.few  __paravirt_pending_syscall_end
 	br.cond.sptk.many .work_processed_kernel
 
@@ -1250,6 +1251,7 @@ GLOBAL_ENTRY(notify_resume_user)
 (pSys)	mov out2=1				// out2=1 => we're in a syscall
 	;;
 (pNonSys) mov out2=0				// out2=0 => not a syscall
+	cmp.eq pNonSys,pSys=r0,r0		// don't do restarts twice
 	.fframe 16
 	.spillsp ar.unat, 16
 	st8 [sp]=r9,-16				// allocate space for ar.unat and save it

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: why is sigreturn() skipping some code on the way out of kernel?
  2012-05-01 23:37 why is sigreturn() skipping some code on the way out of kernel? Al Viro
@ 2012-05-02  4:11 ` David Mosberger-Tang
  2012-05-02  5:46 ` Al Viro
  1 sibling, 0 replies; 3+ messages in thread
From: David Mosberger-Tang @ 2012-05-02  4:11 UTC (permalink / raw)
  To: linux-ia64

[Resending, this time without html bloat...]

Al,

AFAIR (which isn't saying much), it's correct.  sigreturn() is not a
normal syscall.  The code that's being jumped over normally ensures
that the syscall args that were passed into a syscall are present in
user-level upon returning from the syscall.  However, since
sigreturn() does a non-local return, that behavior would be wrong.
For my own reference (and the other few people who still might have a
copy of my book): this is described on page 233, last paragraph.

  --david

On Tue, May 1, 2012 at 5:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> AFAICS, pNonSys had been used for several purposes from the very
> beginning; one of them is suppression of syscall restarts and
> since 2.3.99pre6-4 rt_sigreturn(2) is relying on that to prevent
> a really unpleasant pile of problems.  So far, so good, but that's
> not the only thing conditional on pNonSys - there's also
> (pNonSys) br.cond.dpnt dont_preserve_current_frame
>        COVER                           // add current frame into dirty partition and set cr.ifs
>        ;;
>        mov r19=ar.bsp                  // get new backing store pointer
> rbs_switch:
>        sub r16=r16,r18                 // krbs = old bsp - size of dirty partition
>        cmp.ne p9,p0=r0,r0              // clear p9 to skip restore of cr.ifs
>        ;;
>        sub r19=r19,r16                 // calculate total byte size of dirty partition
>        add r18d,r18                  // don't force in0-in7 into memory...
>        ;;
>        shl r19=r19,16                  // shift size of dirty partition into loadrs position
>        ;;
> dont_preserve_current_frame:
>
> in .work_processed_kernel and I'd really like to know if skipping it on
> sigreturn() is an intended effect.  I don't know ia64 well enough to even
> tell what that code is doing, let alone whether it should be done on
> sigreturn() or not...
>
> The reason I'm asking is that we really need to loop after hitting
> notify_resume_user() when we are returning to userland; not to mention
> anything else, NEED_RESCHED might've been set while we had been building
> sigframes.  Moreover, handling all pending signals is the normal behaviour
> on most of the architectures; some of the reasons for it are covered by
> kinda-sorta looping inside do_signal() (to catch self-inflicted SIGSEGV),
> but the full-blown variant is saner anyway and it's better to have that
> logics consistent between the architectures.
>
> Of course, taking that loop into entry.S needs a different mechanism
> for preventing double restarts; I wonder if using the same method that
> is used by sigreturn() would be safe.  FWIW, entry.S bits are very
> small; the question is whether they are correct...
>
> Comments?
>
> diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
> index 1ccbe12..27322e6 100644
> --- a/arch/ia64/kernel/entry.S
> +++ b/arch/ia64/kernel/entry.S
> @@ -1184,7 +1184,8 @@ skip_rbs_switch:
>
>  .notify:
>  (pUStk)        br.call.spnt.many rp=notify_resume_user
> -.ret10:        cmp.ne p6,p0=r0,r0      // p6 <- 0 (don't re-check)
> +.ret10:
> +(pUStk)        cmp.eq.unc p6,p0=r0,r0          // p6 <- pUStk
>  (pLvSys)br.cond.sptk.few  __paravirt_pending_syscall_end
>        br.cond.sptk.many .work_processed_kernel
>
> @@ -1250,6 +1251,7 @@ GLOBAL_ENTRY(notify_resume_user)
>  (pSys) mov out2=1                              // out2=1 => we're in a syscall
>        ;;
>  (pNonSys) mov out2=0                           // out2=0 => not a syscall
> +       cmp.eq pNonSys,pSys=r0,r0               // don't do restarts twice
>        .fframe 16
>        .spillsp ar.unat, 16
>        st8 [sp]=r9,-16                         // allocate space for ar.unat and save it
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: why is sigreturn() skipping some code on the way out of kernel?
  2012-05-01 23:37 why is sigreturn() skipping some code on the way out of kernel? Al Viro
  2012-05-02  4:11 ` David Mosberger-Tang
@ 2012-05-02  5:46 ` Al Viro
  1 sibling, 0 replies; 3+ messages in thread
From: Al Viro @ 2012-05-02  5:46 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 01, 2012 at 10:09:52PM -0600, David Mosberger-Tang wrote:
> Al,
> 
> AFAIR (which isn't saying much), it's correct.  sigreturn() is not a normal
> syscall.  The code that's being jumped over normally ensures that the
> syscall args that were passed into a syscall are present in user-level upon
> returning from the syscall.  However, since sigreturn() does a non-local
> return, that behavior would be wrong.  For my own reference (and the other
> few people who still might have a copy of my book): this is described on
> page 233, last paragraph.

Umm...  Well, if you've taken a signal with a handler, return to userland
is also non-local, in pretty much the same sense.

I really don't know how it's done on ia64, but in very general terms what
happens on signal arrival is better described if you look at syscalls,
interrupts and exceptions as coroutine calls - you get the state of userland
task saved on entry (and it may depend on the way we'd entered the kernel);
usually large part of the saved state is in pt_regs, but there might be more
to it.  sys_foo() operates on that saved state (well, the function itself
+ bits of glue right around its call).  You not so much return to userland
as resume it, by the saved state.  Handling a signal with userland handler
is basically
	* possibly handle restarts, by updating the saved state
	* encode the saved state and store it in sigcontext et.al.
	* switch the saved state to "entering the handler"
	* resume userland at the new saved state
while sigreturn is
	* decode the saved state from sigcontext
	* if we take signals, act as usual, except that no restarts are
to be done - all adjustments to the saved state we have at that point had
been done when we took that signal in the first place
	* resume userland at whatever saved state we got - either the
one we had encoded into sigcontext back then, or the one matching the
signal handler for signal we took (if we took any during sigreturn(),
that is).  In the latter case, the state that was decoded from sigcontext
gets reencoded into new sigcontext to be picked when we are done with that
new signal handler.

So if you have different actions for resuming userland (which is not unusual
and seems to be what you are talking about), sigreturn ought to match
the saved state when we took the signal.  I.e. in the simplest case (no
signals taken by sigreturn itself) it would depend on whether the signal
had been seen by syscall or by interrupt/exception.

AFAICS, you are using pSys/pNonSys for two things - restart suppression
(sigreturn _is_ equivalent to non-syscall in this respect) and choosing
how to restore the userland state.  And in that respect it's more
complicated.

I don't have your book, so if you could give a bit more details on that
thing...

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-02  5:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-01 23:37 why is sigreturn() skipping some code on the way out of kernel? Al Viro
2012-05-02  4:11 ` David Mosberger-Tang
2012-05-02  5:46 ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.