All of lore.kernel.org
 help / color / mirror / Atom feed
* Sigcontext->sc_pc Passed to User
@ 2002-07-11  9:08 ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-11  9:08 UTC (permalink / raw)
  To: linux-mips

In responding to an enquiry from one of MIPS' third-party
software vendors, I noted something that seems a little
broken to me in the current (and maybe all historical)
MIPS/Linux kernels.  Please forgive me for opening
old wounds if this has been beaten to death in the past.

When a user catches a signal, such as SIGBUS, the
signal "payload" includes a pointer to a sigcontext
structure on the stack, containing the state of the
CPU when the exception associated with the signal
occurred.  But not exactly.  We seem to consistently
call compute_return_epc() before send_sig() or
force_sig().  This results in the user being passed
an indication of the faulting PC that is one instruction
past the true location.  That would be no problem,
except that the faulting instruction may have been 
in a branch delay slot, such that there is no practical
and reliable way for the signal handler to determine
which instruction failed on the basis of the sigcontext
data.

It is, of course, important that execution resume
at the instruction following any instruction generating
an exception/signal.  But that's not the same thing
as saying that the sigcontext should report the resumption
EPC instead of the faulting EPC.  There are various
ways of dealing with this, but before going into any
of them, I'm curious as to whether this has been 
discussed before, and whether anyone thinks that 
things really should be the way they are.

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Sigcontext->sc_pc Passed to User
@ 2002-07-11  9:08 ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-11  9:08 UTC (permalink / raw)
  To: linux-mips

In responding to an enquiry from one of MIPS' third-party
software vendors, I noted something that seems a little
broken to me in the current (and maybe all historical)
MIPS/Linux kernels.  Please forgive me for opening
old wounds if this has been beaten to death in the past.

When a user catches a signal, such as SIGBUS, the
signal "payload" includes a pointer to a sigcontext
structure on the stack, containing the state of the
CPU when the exception associated with the signal
occurred.  But not exactly.  We seem to consistently
call compute_return_epc() before send_sig() or
force_sig().  This results in the user being passed
an indication of the faulting PC that is one instruction
past the true location.  That would be no problem,
except that the faulting instruction may have been 
in a branch delay slot, such that there is no practical
and reliable way for the signal handler to determine
which instruction failed on the basis of the sigcontext
data.

It is, of course, important that execution resume
at the instruction following any instruction generating
an exception/signal.  But that's not the same thing
as saying that the sigcontext should report the resumption
EPC instead of the faulting EPC.  There are various
ways of dealing with this, but before going into any
of them, I'm curious as to whether this has been 
discussed before, and whether anyone thinks that 
things really should be the way they are.

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
  2002-07-11  9:08 ` Kevin D. Kissell
  (?)
@ 2002-07-11 13:17 ` Maciej W. Rozycki
  2002-07-11 15:16     ` Kevin D. Kissell
  -1 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2002-07-11 13:17 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Thu, 11 Jul 2002, Kevin D. Kissell wrote:

> In responding to an enquiry from one of MIPS' third-party
> software vendors, I noted something that seems a little
> broken to me in the current (and maybe all historical)
> MIPS/Linux kernels.  Please forgive me for opening
> old wounds if this has been beaten to death in the past.

 :-/

> When a user catches a signal, such as SIGBUS, the
> signal "payload" includes a pointer to a sigcontext
> structure on the stack, containing the state of the
> CPU when the exception associated with the signal
> occurred.  But not exactly.  We seem to consistently
> call compute_return_epc() before send_sig() or
> force_sig().  This results in the user being passed
> an indication of the faulting PC that is one instruction
> past the true location.  That would be no problem,
> except that the faulting instruction may have been 
> in a branch delay slot, such that there is no practical
> and reliable way for the signal handler to determine
> which instruction failed on the basis of the sigcontext
> data.

 That needs to be done globally, once and forever for all kinds of signals
passed to a program.  I have partial fixes that I am using privately
already, but a complete solution is on my to-do list. 

> It is, of course, important that execution resume
> at the instruction following any instruction generating
> an exception/signal.  But that's not the same thing
> as saying that the sigcontext should report the resumption
> EPC instead of the faulting EPC.  There are various
> ways of dealing with this, but before going into any
> of them, I'm curious as to whether this has been 
> discussed before, and whether anyone thinks that 
> things really should be the way they are.

 I believe the resumption should happen with EPC unmodified.  A handler
may set EPC differently if it wants (possibly with longjmp() or by
interpreting code at EPC and modifying EPC appropriately).  For the three
signal handling possibilities, I'd do that as follows (assuming SIGBUS,
SIGSEGV, etc. lethal signals): 

- SIG_IGN: return to EPC with no action.  A program will loop
  indefinitely, but if that's what a user wants...

- SIG_DFL: kill.

- HANDLER: call a handler with the signal context unmodified and let the
  user code decide what to do.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-11 15:16     ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-11 15:16 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

From: "Maciej W. Rozycki" <macro@ds2.pg.gda.pl>

[snip]

>  I believe the resumption should happen with EPC unmodified.  A handler
> may set EPC differently if it wants (possibly with longjmp() or by
> interpreting code at EPC and modifying EPC appropriately).  For the three
> signal handling possibilities, I'd do that as follows (assuming SIGBUS,
> SIGSEGV, etc. lethal signals): 
> 
> - SIG_IGN: return to EPC with no action.  A program will loop
>   indefinitely, but if that's what a user wants...

I don't think that this is the right thing to do, philosophically.
Hanging in an infinite loop and making no forward progress
is not, to me "ignoring" an event. The old X/Open specs I've 
got say that SIGFPE, SIGILL, and SIGSEGV behavior is 
undefined if bound to SIG_IGN (curiously, they don't call 
out SIGBUS), but I think that in practical terms we need to 
provide whatever behavior people expect from Linux on
x86 and PPC.  What happens on those platforms?  A
quick look at the x86 kernel code makes me think that
they do, indeed, do the "wrong" thing and beat their
heads against the ignored event for all eternity, but I'm
insufficiently an expert in x86 trap semantics to know
for certain whether that's the case.  If it is, right or 
wrong, that's what we ought to do.

> - SIG_DFL: kill.
> 
> - HANDLER: call a handler with the signal context unmodified and let the
>   user code decide what to do.

Independently of what we do for the SIG_IGN cases,
this is important, and the user code cannot decide what
to do if it cannot know what instruction caused the fault.
Fixups on SIGFPE must be able to find the FP instruction,
which is not currently possible if it was in a branch delay
slot.  Similarly, user-mode emulation of "memory" via
signal handlers cannot work unless the loads and stores
can be identified.  But, having "done the deed", return
from the signal handler should resume at the instruction
*following* the one generating the fault, and not replay
the same instruction.  We *could* punt that to the signal
handler, but making every signal package carry its own
copy of compute_return_epc() to handle the branch
delay slot cases strikes me as being unfriendly to the
user and is arguably slightly less reliable.  I guess I'd like things 
to be rigged so that the sigcontext structure contains the address 
of the faulting instruction as the sc_pc, but where the return 
from signal goes to the address calculated by 
compute_return_epc().  But again, what do people expect 
in the "mainstream" world of x86 Linux?

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-11 15:16     ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-11 15:16 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

From: "Maciej W. Rozycki" <macro@ds2.pg.gda.pl>

[snip]

>  I believe the resumption should happen with EPC unmodified.  A handler
> may set EPC differently if it wants (possibly with longjmp() or by
> interpreting code at EPC and modifying EPC appropriately).  For the three
> signal handling possibilities, I'd do that as follows (assuming SIGBUS,
> SIGSEGV, etc. lethal signals): 
> 
> - SIG_IGN: return to EPC with no action.  A program will loop
>   indefinitely, but if that's what a user wants...

I don't think that this is the right thing to do, philosophically.
Hanging in an infinite loop and making no forward progress
is not, to me "ignoring" an event. The old X/Open specs I've 
got say that SIGFPE, SIGILL, and SIGSEGV behavior is 
undefined if bound to SIG_IGN (curiously, they don't call 
out SIGBUS), but I think that in practical terms we need to 
provide whatever behavior people expect from Linux on
x86 and PPC.  What happens on those platforms?  A
quick look at the x86 kernel code makes me think that
they do, indeed, do the "wrong" thing and beat their
heads against the ignored event for all eternity, but I'm
insufficiently an expert in x86 trap semantics to know
for certain whether that's the case.  If it is, right or 
wrong, that's what we ought to do.

> - SIG_DFL: kill.
> 
> - HANDLER: call a handler with the signal context unmodified and let the
>   user code decide what to do.

Independently of what we do for the SIG_IGN cases,
this is important, and the user code cannot decide what
to do if it cannot know what instruction caused the fault.
Fixups on SIGFPE must be able to find the FP instruction,
which is not currently possible if it was in a branch delay
slot.  Similarly, user-mode emulation of "memory" via
signal handlers cannot work unless the loads and stores
can be identified.  But, having "done the deed", return
from the signal handler should resume at the instruction
*following* the one generating the fault, and not replay
the same instruction.  We *could* punt that to the signal
handler, but making every signal package carry its own
copy of compute_return_epc() to handle the branch
delay slot cases strikes me as being unfriendly to the
user and is arguably slightly less reliable.  I guess I'd like things 
to be rigged so that the sigcontext structure contains the address 
of the faulting instruction as the sc_pc, but where the return 
from signal goes to the address calculated by 
compute_return_epc().  But again, what do people expect 
in the "mainstream" world of x86 Linux?

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
  2002-07-11 15:16     ` Kevin D. Kissell
  (?)
@ 2002-07-11 16:52     ` Maciej W. Rozycki
  -1 siblings, 0 replies; 18+ messages in thread
From: Maciej W. Rozycki @ 2002-07-11 16:52 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Thu, 11 Jul 2002, Kevin D. Kissell wrote:

> > - SIG_IGN: return to EPC with no action.  A program will loop
> >   indefinitely, but if that's what a user wants...
> 
> I don't think that this is the right thing to do, philosophically.
> Hanging in an infinite loop and making no forward progress
> is not, to me "ignoring" an event. The old X/Open specs I've 
> got say that SIGFPE, SIGILL, and SIGSEGV behavior is 
> undefined if bound to SIG_IGN (curiously, they don't call 
> out SIGBUS), but I think that in practical terms we need to 
> provide whatever behavior people expect from Linux on
> x86 and PPC.  What happens on those platforms?  A
> quick look at the x86 kernel code makes me think that
> they do, indeed, do the "wrong" thing and beat their
> heads against the ignored event for all eternity, but I'm
> insufficiently an expert in x86 trap semantics to know
> for certain whether that's the case.  If it is, right or 
> wrong, that's what we ought to do.

 Yes, they loop indefinitely.  That my be useful for debugging -- you may
attach to a running program and you'll be sure to get at the faulting
instruction.  Otherwise the warning from the libc manual applies:

 "If you block or ignore these signals or establish handlers for them that
return normally, your program will probably break horribly when such
signals happen, unless they are generated by `raise' or `kill' instead of
a real error."

So a user (programmer) has been warned. 

> > - HANDLER: call a handler with the signal context unmodified and let the
> >   user code decide what to do.
> 
> Independently of what we do for the SIG_IGN cases,
> this is important, and the user code cannot decide what
> to do if it cannot know what instruction caused the fault.
> Fixups on SIGFPE must be able to find the FP instruction,
> which is not currently possible if it was in a branch delay
> slot.  Similarly, user-mode emulation of "memory" via

 Well, the Cause register is passed to the userland, so only EPC needs to
be fixed. 

> signal handlers cannot work unless the loads and stores
> can be identified.  But, having "done the deed", return
> from the signal handler should resume at the instruction
> *following* the one generating the fault, and not replay
> the same instruction.  We *could* punt that to the signal
> handler, but making every signal package carry its own
> copy of compute_return_epc() to handle the branch
> delay slot cases strikes me as being unfriendly to the
> user and is arguably slightly less reliable.  I guess I'd like things 
> to be rigged so that the sigcontext structure contains the address 
> of the faulting instruction as the sc_pc, but where the return 
> from signal goes to the address calculated by 
> compute_return_epc().  But again, what do people expect 
> in the "mainstream" world of x86 Linux?

 ;-)

 FPE faults on the x87 fault before the *following* FP instruction (which
is a regular one or the special "wait" one).  The context of the faulting
instruction (both the instruction and data addresses and the opcode) is
saved in special registers (as usually with i386, the most complex way was
chosen) and can be retrieved by dumping the FPU context to memory (see the
"fnstenv" and "fnsave" instructions). 
 
 So the i386 is very different and can't really be used as a reference.

 However, a brief look at the Alpha port (which is mature and also the
Alpha CPU is much similar to MIPS) reveals the code never modifies the
saved PC in the kernel.  But again, the FPU traps happen after faulting
instructions (for older models even imprecisely -- see the search back
code in alpha_fp_emul_imprecise()).

 With current specifications I think the best way for the SIGFPE handler
(since it's somewhat special)  would be to provide the address of the
faulting instruction in siginfo_t.si_addr and have the EPC in sigcontext
set up for a continuation (that would still allow longjmp(), etc.). 
Ideally, I'd see it reversely, i.e. EPC unchanged and siginfo_t.si_addr
containing an address to continue, so that a handler would have to
explicitly copy the address to EPC if it decided it handled the signal
successfully (so that a program doesn't continue unpredictably after an
integer division by zero, because the handler expected only real FP
faults) -- maybe we should extend siginfo_t? 

 For other exceptions, I'd just leave EPC alone.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
  2002-07-11  9:08 ` Kevin D. Kissell
  (?)
  (?)
@ 2002-07-12  1:40 ` Ralf Baechle
  2002-07-12  8:00     ` Kevin D. Kissell
  -1 siblings, 1 reply; 18+ messages in thread
From: Ralf Baechle @ 2002-07-12  1:40 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Thu, Jul 11, 2002 at 11:08:21AM +0200, Kevin D. Kissell wrote:

> In responding to an enquiry from one of MIPS' third-party
> software vendors, I noted something that seems a little
> broken to me in the current (and maybe all historical)
> MIPS/Linux kernels.  Please forgive me for opening
> old wounds if this has been beaten to death in the past.
> 
> When a user catches a signal, such as SIGBUS, the
> signal "payload" includes a pointer to a sigcontext
> structure on the stack, containing the state of the
> CPU when the exception associated with the signal
> occurred.  But not exactly.  We seem to consistently
> call compute_return_epc() before send_sig() or
> force_sig().  This results in the user being passed
> an indication of the faulting PC that is one instruction
> past the true location.  That would be no problem,
> except that the faulting instruction may have been 
> in a branch delay slot, such that there is no practical
> and reliable way for the signal handler to determine
> which instruction failed on the basis of the sigcontext
> data.
> 
> It is, of course, important that execution resume
> at the instruction following any instruction generating
> an exception/signal.  But that's not the same thing
> as saying that the sigcontext should report the resumption
> EPC instead of the faulting EPC.  There are various
> ways of dealing with this, but before going into any
> of them, I'm curious as to whether this has been 
> discussed before, and whether anyone thinks that 
> things really should be the way they are.

Our signal stackframe is almost the same as on IRIX5 which is what
some software expects.  Maybe time to checkout what IRIX does ...

  Ralf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12  8:00     ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-12  8:00 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

From: "Ralf Baechle" <ralf@oss.sgi.com>
> On Thu, Jul 11, 2002 at 11:08:21AM +0200, Kevin D. Kissell wrote:
[snip]
> > When a user catches a signal, such as SIGBUS, the
> > signal "payload" includes a pointer to a sigcontext
> > structure on the stack, containing the state of the
> > CPU when the exception associated with the signal
> > occurred.  But not exactly.  We seem to consistently
> > call compute_return_epc() before send_sig() or
> > force_sig().  This results in the user being passed
> > an indication of the faulting PC that is one instruction
> > past the true location.  That would be no problem,
> > except that the faulting instruction may have been 
> > in a branch delay slot, such that there is no practical
> > and reliable way for the signal handler to determine
> > which instruction failed on the basis of the sigcontext
> > data.
> > 
> > It is, of course, important that execution resume
> > at the instruction following any instruction generating
> > an exception/signal.  But that's not the same thing
> > as saying that the sigcontext should report the resumption
> > EPC instead of the faulting EPC.  There are various
> > ways of dealing with this, but before going into any
> > of them, I'm curious as to whether this has been 
> > discussed before, and whether anyone thinks that 
> > things really should be the way they are.
> 
> Our signal stackframe is almost the same as on IRIX5 which is what
> some software expects.  Maybe time to checkout what IRIX does ...

The IRIX team made some stunningly bad design 
decisions over the years, my favorite being "virtual
swap space" and its side effect of deliberately killing 
system daemons at random under load.  A signal scheme
such as we have now in MIPS/Linux, where a user program
*cannot* identify the instruction causing a signal if
that instruction was in the delay slot of a taken branch,
is broken from first principles.

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12  8:00     ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-12  8:00 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

From: "Ralf Baechle" <ralf@oss.sgi.com>
> On Thu, Jul 11, 2002 at 11:08:21AM +0200, Kevin D. Kissell wrote:
[snip]
> > When a user catches a signal, such as SIGBUS, the
> > signal "payload" includes a pointer to a sigcontext
> > structure on the stack, containing the state of the
> > CPU when the exception associated with the signal
> > occurred.  But not exactly.  We seem to consistently
> > call compute_return_epc() before send_sig() or
> > force_sig().  This results in the user being passed
> > an indication of the faulting PC that is one instruction
> > past the true location.  That would be no problem,
> > except that the faulting instruction may have been 
> > in a branch delay slot, such that there is no practical
> > and reliable way for the signal handler to determine
> > which instruction failed on the basis of the sigcontext
> > data.
> > 
> > It is, of course, important that execution resume
> > at the instruction following any instruction generating
> > an exception/signal.  But that's not the same thing
> > as saying that the sigcontext should report the resumption
> > EPC instead of the faulting EPC.  There are various
> > ways of dealing with this, but before going into any
> > of them, I'm curious as to whether this has been 
> > discussed before, and whether anyone thinks that 
> > things really should be the way they are.
> 
> Our signal stackframe is almost the same as on IRIX5 which is what
> some software expects.  Maybe time to checkout what IRIX does ...

The IRIX team made some stunningly bad design 
decisions over the years, my favorite being "virtual
swap space" and its side effect of deliberately killing 
system daemons at random under load.  A signal scheme
such as we have now in MIPS/Linux, where a user program
*cannot* identify the instruction causing a signal if
that instruction was in the delay slot of a taken branch,
is broken from first principles.

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
  2002-07-12  8:00     ` Kevin D. Kissell
  (?)
@ 2002-07-12 10:00     ` Ralf Baechle
  2002-07-12 11:49         ` Kevin D. Kissell
  2002-07-12 13:01         ` Alan Cox
  -1 siblings, 2 replies; 18+ messages in thread
From: Ralf Baechle @ 2002-07-12 10:00 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Fri, Jul 12, 2002 at 10:00:27AM +0200, Kevin D. Kissell wrote:

> The IRIX team made some stunningly bad design 
> decisions over the years, my favorite being "virtual
> swap space" and its side effect of deliberately killing 
> system daemons at random under load.  A signal scheme
> such as we have now in MIPS/Linux, where a user program
> *cannot* identify the instruction causing a signal if
> that instruction was in the delay slot of a taken branch,
> is broken from first principles.

Certainly you're right when you say a signal handler show know which
instruction was causing a fault.  Ours is simply a too bad implementation
of their interface ...

IRIX virtual swap space is simply memory overcommit.  Linux has that too
and it's been subject to frequent religious discussions on Linux kernel.
Non-overcommit means large amounts of memory are required when forking
of a new process.  The standard example is a fat bloated Mozilla forking
for printing.  Non-overcommit means you need those 50 or 100 megs of
Mozilla process size once more and if not as physical memory then at
least as swap space.  Deciede yourself if you're paranoid and want that
operation to only succeed if that much memory is actually available or
if you take the risk of the fork & exec operation failing the other way.

  Ralf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12 11:49         ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-12 11:49 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

From: "Ralf Baechle" <ralf@oss.sgi.com>
> On Fri, Jul 12, 2002 at 10:00:27AM +0200, Kevin D. Kissell wrote:
> 
> > The IRIX team made some stunningly bad design 
> > decisions over the years, my favorite being "virtual
> > swap space" and its side effect of deliberately killing 
> > system daemons at random under load.  A signal scheme
> > such as we have now in MIPS/Linux, where a user program
> > *cannot* identify the instruction causing a signal if
> > that instruction was in the delay slot of a taken branch,
> > is broken from first principles.
> 
> Certainly you're right when you say a signal handler show know which
> instruction was causing a fault.  Ours is simply a too bad implementation
> of their interface ...
> 
> IRIX virtual swap space is simply memory overcommit.  Linux has that too
> and it's been subject to frequent religious discussions on Linux kernel.
> Non-overcommit means large amounts of memory are required when forking
> of a new process.  The standard example is a fat bloated Mozilla forking
> for printing.  Non-overcommit means you need those 50 or 100 megs of
> Mozilla process size once more and if not as physical memory then at
> least as swap space.  Deciede yourself if you're paranoid and want that
> operation to only succeed if that much memory is actually available or
> if you take the risk of the fork & exec operation failing the other way.

Whenever it's been my design responsibility, I made forks fail if
there wasn't enough backing store to handle the process.  Frankly,
there are limits to the degree to which an OS should compromise
its integrity for the sake of supporting badly concieved applications,
be they Mozilla or the SGI integrated CAD environment.  But
even if you prefer to take the "speculative" or "optimistic" model
for handling the situation, what IRIX did was insane:  When, after
having allowed too many unsupportable forks to succeed, they
detected deadlock in the swap system, they killed processes
*at random*.  Including system daemons.  At a *minimum*,
a system should only terminate processes belonging to the
user (and preferably the process group) who has been granted
speculative fork success.  Anything else is a massive "breach of
contract" for a multiuser OS.

IMHO, if someone really wanted to fix this in the OS, 
we'd get beyond the traditional Unix "fork" model.  
And if someone really wanted to avoid the problem in Mozilla or 
an IDE, one would have all subprograms launched by a tiny 
"launcher", who would recieve instructions and data via some 
form of IPC, fork itself, and exec as appropriate.

But this is getting a bit off the topic.  Is anyone aware of any
IRIX applications ported to Linux that would break if we
corrected the signal payload semantics?

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12 11:49         ` Kevin D. Kissell
  0 siblings, 0 replies; 18+ messages in thread
From: Kevin D. Kissell @ 2002-07-12 11:49 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

From: "Ralf Baechle" <ralf@oss.sgi.com>
> On Fri, Jul 12, 2002 at 10:00:27AM +0200, Kevin D. Kissell wrote:
> 
> > The IRIX team made some stunningly bad design 
> > decisions over the years, my favorite being "virtual
> > swap space" and its side effect of deliberately killing 
> > system daemons at random under load.  A signal scheme
> > such as we have now in MIPS/Linux, where a user program
> > *cannot* identify the instruction causing a signal if
> > that instruction was in the delay slot of a taken branch,
> > is broken from first principles.
> 
> Certainly you're right when you say a signal handler show know which
> instruction was causing a fault.  Ours is simply a too bad implementation
> of their interface ...
> 
> IRIX virtual swap space is simply memory overcommit.  Linux has that too
> and it's been subject to frequent religious discussions on Linux kernel.
> Non-overcommit means large amounts of memory are required when forking
> of a new process.  The standard example is a fat bloated Mozilla forking
> for printing.  Non-overcommit means you need those 50 or 100 megs of
> Mozilla process size once more and if not as physical memory then at
> least as swap space.  Deciede yourself if you're paranoid and want that
> operation to only succeed if that much memory is actually available or
> if you take the risk of the fork & exec operation failing the other way.

Whenever it's been my design responsibility, I made forks fail if
there wasn't enough backing store to handle the process.  Frankly,
there are limits to the degree to which an OS should compromise
its integrity for the sake of supporting badly concieved applications,
be they Mozilla or the SGI integrated CAD environment.  But
even if you prefer to take the "speculative" or "optimistic" model
for handling the situation, what IRIX did was insane:  When, after
having allowed too many unsupportable forks to succeed, they
detected deadlock in the swap system, they killed processes
*at random*.  Including system daemons.  At a *minimum*,
a system should only terminate processes belonging to the
user (and preferably the process group) who has been granted
speculative fork success.  Anything else is a massive "breach of
contract" for a multiuser OS.

IMHO, if someone really wanted to fix this in the OS, 
we'd get beyond the traditional Unix "fork" model.  
And if someone really wanted to avoid the problem in Mozilla or 
an IDE, one would have all subprograms launched by a tiny 
"launcher", who would recieve instructions and data via some 
form of IPC, fork itself, and exec as appropriate.

But this is getting a bit off the topic.  Is anyone aware of any
IRIX applications ported to Linux that would break if we
corrected the signal payload semantics?

            Kevin K.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12 13:01         ` Alan Cox
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Cox @ 2002-07-12 13:01 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Kevin D. Kissell, linux-mips

> Non-overcommit means large amounts of memory are required when forking
> of a new process.  The standard example is a fat bloated Mozilla forking
> for printing.  Non-overcommit means you need those 50 or 100 megs of
> Mozilla process size once more and if not as physical memory then at
> least as swap space.  Deciede yourself if you're paranoid and want that
> operation to only succeed if that much memory is actually available or
> if you take the risk of the fork & exec operation failing the other way.

Your numbers are ridiculously off.

A mozilla instance on x86 commits 17Mb of potentially swap backed memory
when viewing the mozilla 1.0 start page. (Its actually a bit less but there
is delay in the garbage collector)

2.4.18/19-ac support non overcommit, and its rather useful

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12 13:01         ` Alan Cox
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Cox @ 2002-07-12 13:01 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Kevin D. Kissell, linux-mips

> Non-overcommit means large amounts of memory are required when forking
> of a new process.  The standard example is a fat bloated Mozilla forking
> for printing.  Non-overcommit means you need those 50 or 100 megs of
> Mozilla process size once more and if not as physical memory then at
> least as swap space.  Deciede yourself if you're paranoid and want that
> operation to only succeed if that much memory is actually available or
> if you take the risk of the fork & exec operation failing the other way.

Your numbers are ridiculously off.

A mozilla instance on x86 commits 17Mb of potentially swap backed memory
when viewing the mozilla 1.0 start page. (Its actually a bit less but there
is delay in the garbage collector)

2.4.18/19-ac support non overcommit, and its rather useful

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
  2002-07-12 13:01         ` Alan Cox
  (?)
@ 2002-07-12 14:23         ` Ralf Baechle
  2002-07-12 15:36             ` Alan Cox
  -1 siblings, 1 reply; 18+ messages in thread
From: Ralf Baechle @ 2002-07-12 14:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: Kevin D. Kissell, linux-mips

On Fri, Jul 12, 2002 at 02:01:56PM +0100, Alan Cox wrote:

> > Non-overcommit means large amounts of memory are required when forking
> > of a new process.  The standard example is a fat bloated Mozilla forking
> > for printing.  Non-overcommit means you need those 50 or 100 megs of
> > Mozilla process size once more and if not as physical memory then at
> > least as swap space.  Deciede yourself if you're paranoid and want that
> > operation to only succeed if that much memory is actually available or
> > if you take the risk of the fork & exec operation failing the other way.
> 
> Your numbers are ridiculously off.
>
> A mozilla instance on x86 commits 17Mb of potentially swap backed memory
> when viewing the mozilla 1.0 start page. (Its actually a bit less but there
> is delay in the garbage collector)

These were typical numbers of the last Mozilla I hacked myself on MIPS.
It can grow larger without doing alot.  Aside of that this isn't Mozilla
specific; any arbitrary program that does some fork & exec thing and
it's memory size could be choosen.

> 2.4.18/19-ac support non overcommit, and its rather useful

No doubt about that.  I just say non overcommit has been subject to long
discussions and as usually in such religious discussions both sides had
valid arguments.  I leave it to everybody to choose his / her own poison.

  Ralf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
  2002-07-12 11:49         ` Kevin D. Kissell
  (?)
@ 2002-07-12 15:29         ` Ralf Baechle
  -1 siblings, 0 replies; 18+ messages in thread
From: Ralf Baechle @ 2002-07-12 15:29 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Fri, Jul 12, 2002 at 01:49:15PM +0200, Kevin D. Kissell wrote:

> Whenever it's been my design responsibility, I made forks fail if
> there wasn't enough backing store to handle the process.  Frankly,
> there are limits to the degree to which an OS should compromise
> its integrity for the sake of supporting badly concieved applications,
> be they Mozilla or the SGI integrated CAD environment.  But
> even if you prefer to take the "speculative" or "optimistic" model
> for handling the situation, what IRIX did was insane:  When, after
> having allowed too many unsupportable forks to succeed, they
> detected deadlock in the swap system, they killed processes
> *at random*.  Including system daemons.  At a *minimum*,
> a system should only terminate processes belonging to the
> user (and preferably the process group) who has been granted
> speculative fork success.  Anything else is a massive "breach of
> contract" for a multiuser OS.

See linux/mm/oom_kill.c:oom_kill() ...

> IMHO, if someone really wanted to fix this in the OS, 
> we'd get beyond the traditional Unix "fork" model.  
> And if someone really wanted to avoid the problem in Mozilla or 
> an IDE, one would have all subprograms launched by a tiny 
> "launcher", who would recieve instructions and data via some 
> form of IPC, fork itself, and exec as appropriate.

That or more Linux specific a clone/vfork & exec approach.

> But this is getting a bit off the topic.  Is anyone aware of any
> IRIX applications ported to Linux that would break if we
> corrected the signal payload semantics?

As I said we even missimplemented the IRIX semantics.  In IRIX the
sc_pc field of the frame is pointing to the instruction that was causing
the signal while we try to skip over it - with all the side effects that
we're just discussing.  I tried that for both trap and break instructions.

So I suggest we simply remove the compute_return_epc() calls from do_bp
and do_trap.  I haven't tested this but I'd assume this would also be
the behaviour that gdb is expecting.  So that would follow the example
given by Linux/i386 and IRIX and should your ISV's problem.  What more could
we ask for.

I still have to look over the other exceptions that may call
compute_return_epc() but it seems we should do the same thing for all
of them and not call compute_return_epc if we're going to send a signal.

  Ralf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12 15:36             ` Alan Cox
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Cox @ 2002-07-12 15:36 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Alan Cox, Kevin D. Kissell, linux-mips

> > A mozilla instance on x86 commits 17Mb of potentially swap backed memory
> > when viewing the mozilla 1.0 start page. (Its actually a bit less but there
> > is delay in the garbage collector)
> 
> These were typical numbers of the last Mozilla I hacked myself on MIPS.
> It can grow larger without doing alot.  Aside of that this isn't Mozilla
> specific; any arbitrary program that does some fork & exec thing and
> it's memory size could be choosen.

These are precise page accurate measurements from the real world. What most
people forget is that very little of an ELF application is actually swap
backed as opposed to file backed read only

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sigcontext->sc_pc Passed to User
@ 2002-07-12 15:36             ` Alan Cox
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Cox @ 2002-07-12 15:36 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Alan Cox, Kevin D. Kissell, linux-mips

> > A mozilla instance on x86 commits 17Mb of potentially swap backed memory
> > when viewing the mozilla 1.0 start page. (Its actually a bit less but there
> > is delay in the garbage collector)
> 
> These were typical numbers of the last Mozilla I hacked myself on MIPS.
> It can grow larger without doing alot.  Aside of that this isn't Mozilla
> specific; any arbitrary program that does some fork & exec thing and
> it's memory size could be choosen.

These are precise page accurate measurements from the real world. What most
people forget is that very little of an ELF application is actually swap
backed as opposed to file backed read only

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2002-07-12 15:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-11  9:08 Sigcontext->sc_pc Passed to User Kevin D. Kissell
2002-07-11  9:08 ` Kevin D. Kissell
2002-07-11 13:17 ` Maciej W. Rozycki
2002-07-11 15:16   ` Kevin D. Kissell
2002-07-11 15:16     ` Kevin D. Kissell
2002-07-11 16:52     ` Maciej W. Rozycki
2002-07-12  1:40 ` Ralf Baechle
2002-07-12  8:00   ` Kevin D. Kissell
2002-07-12  8:00     ` Kevin D. Kissell
2002-07-12 10:00     ` Ralf Baechle
2002-07-12 11:49       ` Kevin D. Kissell
2002-07-12 11:49         ` Kevin D. Kissell
2002-07-12 15:29         ` Ralf Baechle
2002-07-12 13:01       ` Alan Cox
2002-07-12 13:01         ` Alan Cox
2002-07-12 14:23         ` Ralf Baechle
2002-07-12 15:36           ` Alan Cox
2002-07-12 15:36             ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.