public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux-ia64] Bug in signal handling
@ 2001-12-02 22:05 Andreas Schwab
  2001-12-03  3:15 ` David Mosberger
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Andreas Schwab @ 2001-12-02 22:05 UTC (permalink / raw)
  To: linux-ia64

The kernel does not correctly handle interrupted syscalls that are
supposed to be restarted when two nested signal handlers are executed at
the same time.  To reproduce run this program in one terminal:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>

void
sigusr1 (int sig)
{
  write (2, "SIGUSR1\n", strlen ("SIGUSR1\n"));
}

void
sigusr2 (int sig)
{
  write (2, "SIGUSR2\n", strlen ("SIGUSR2\n"));
}

int
main ()
{
  char c;
  struct sigaction sa;

  printf ("%d\n", getpid ());
  sa.sa_handler = sigusr1;
  sigemptyset (&sa.sa_mask);
  sa.sa_flags = SA_RESTART;
  sigaction (SIGUSR1, &sa, NULL);
  sa.sa_handler = sigusr2;
  sigaction (SIGUSR2, &sa, NULL);
  read (1, &c, 1);
  return 0;
}

Then send both SIGUSR1 and SIGUSR2 to the process from another terminal.
If they arrive close enough then the first signal handler will be
interupted at GATE_ADDR by the second handler.  But ia64_do_signal is
again called with in_syscall == 1, and it will call ia64_decrement_ip
before setting up the signal handler frame.  Thus when the second signal
returns then rt_sigreturn it will return to GATE_ADDR - 16 and crashes.

Andreas.

-- 
Andreas Schwab                                  "And now for something
Andreas.Schwab@suse.de				completely different."
SuSE Labs, SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] Bug in signal handling
  2001-12-02 22:05 [Linux-ia64] Bug in signal handling Andreas Schwab
@ 2001-12-03  3:15 ` David Mosberger
  2001-12-04 19:53 ` David Mosberger
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2001-12-03  3:15 UTC (permalink / raw)
  To: linux-ia64

>>>>> On 02 Dec 2001 23:05:29 +0100, Andreas Schwab <schwab@suse.de> said:

  Andreas> Then send both SIGUSR1 and SIGUSR2 to the process from
  Andreas> another terminal.  If they arrive close enough then the
  Andreas> first signal handler will be interupted at GATE_ADDR by the
  Andreas> second handler.  But ia64_do_signal is again called with
  Andreas> in_syscall = 1, and it will call ia64_decrement_ip before
  Andreas> setting up the signal handler frame.  Thus when the second
  Andreas> signal returns then rt_sigreturn it will return to
  Andreas> GATE_ADDR - 16 and crashes.

I think I know what the problem is.  If I'm right, then, oddly enough,
I discovered the same bug just yesterday (while proof-reading the
book, nevertheless... ;-).

Anyhow, I'll look into it on Monday.

Thanks,

	--david


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] Bug in signal handling
  2001-12-02 22:05 [Linux-ia64] Bug in signal handling Andreas Schwab
  2001-12-03  3:15 ` David Mosberger
@ 2001-12-04 19:53 ` David Mosberger
  2001-12-05  1:14 ` Richard Henderson
  2001-12-05  2:15 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2001-12-04 19:53 UTC (permalink / raw)
  To: linux-ia64

[I'm resending this and a couple of other mails because there were
 some problems with the linuxia64.org mailer.]

>>>>> On 02 Dec 2001 23:05:29 +0100, Andreas Schwab <schwab@suse.de> said:

  Andreas> The kernel does not correctly handle interrupted syscalls
  Andreas> that are supposed to be restarted when two nested signal
  Andreas> handlers are executed at the same time.  To reproduce run
  Andreas> this program in one terminal:

The attached patch should fix this problem.  It turned out that the
kernel exit path sometimes ended up checking for pending signals
multiple times, which is no longer valid.  The hardest part about this
bug was finding an automatic way of testing it.  I have that now so
hopefully the bug won't ever rear its ugly head again.

The patch also fixes a potential race condition which could have the
effect that a CPU does not always run the high-priority task in the
system.

Richard, I'm cc'ing you because it appears to me that Alpha Linux may
have the same problem.  I looked into how this bug came about: the x86
version was changed as part of the softirq rewrite that happened in
2.4.6, and I simply missed it.  Perhaps other platforms suffer from
this bug as well (MIPS appears to have been fixed in 2.4.10 though).

Please let me know how this works.

	--david

--- linux-2.4.16/arch/ia64/kernel/entry.S	Mon Nov 26 11:18:20 2001
+++ lia64-kdb/arch/ia64/kernel/entry.S	Mon Dec  3 16:58:41 2001
@@ -519,6 +519,8 @@
 	lfetch.fault [sp]
 	movl r14=.restart
 	;;
+	// need_resched and signals atomic test
+(pUser)	rsm psr.i
 	mov.ret.sptk rp=r14,.restart
 .restart:
 	adds r17=IA64_TASK_NEED_RESCHED_OFFSET,r13
@@ -539,8 +541,6 @@
 (pUser)	cmp.ne.unc p7,p0=r17,r0			// current->need_resched != 0?
 (pUser)	cmp.ne.unc p8,p0=r18,r0			// current->sigpending != 0?
 	;;
-	adds r2=PT(R8)+16,r12
-	adds r3=PT(R9)+16,r12
 #ifdef CONFIG_PERFMON
 (p9)	br.call.spnt.many b7=pfm_block_on_overflow
 #endif
@@ -549,7 +549,10 @@
 #else
 (p7)	br.call.spnt.many b7=schedule
 #endif
-(p8)	br.call.spnt.many b7=handle_signal_delivery	// check & deliver pending signals
+(p8)	br.call.spnt.many rp=handle_signal_delivery	// check & deliver pending signals (once)
+	;;
+.ret9:	adds r2=PT(R8)+16,r12
+	adds r3=PT(R9)+16,r12
 	;;
 	// start restoring the state saved on the kernel stack (struct pt_regs):
 	ld8.fill r8=[r2],16
@@ -582,7 +585,7 @@
 	ld8.fill r30=[r2],16
 	ld8.fill r31=[r3],16
 	;;
-	rsm psr.i | psr.ic	// initiate turning off of interrupts & interruption collection
+	rsm psr.i | psr.ic	// initiate turning off of interrupt and interruption collection
 	invala			// invalidate ALAT
 	;;
 	ld8 r1=[r2],16		// ar.ccv
@@ -601,7 +604,7 @@
 	mov ar.fpsr=r13
 	mov b0=r14
 	;;
-	srlz.i			// ensure interrupts & interruption collection are off
+	srlz.i			// ensure interruption collection is off
 	mov b7=r15
 	;;
 	bsw.0			// switch back to bank 0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] Bug in signal handling
  2001-12-02 22:05 [Linux-ia64] Bug in signal handling Andreas Schwab
  2001-12-03  3:15 ` David Mosberger
  2001-12-04 19:53 ` David Mosberger
@ 2001-12-05  1:14 ` Richard Henderson
  2001-12-05  2:15 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2001-12-05  1:14 UTC (permalink / raw)
  To: linux-ia64

On Mon, Dec 03, 2001 at 06:19:52PM -0800, David Mosberger wrote:
> Richard, I'm cc'ing you because it appears to me that Alpha Linux may
> have the same problem.

I don't think it does:

        bne     $5,signal_return
restore_all:
        RESTORE_ALL
        call_pal PAL_rti
[...]
signal_return:
        mov     $30,$17
        br      $1,do_switch_stack
        mov     $30,$18
        mov     $31,$16
        jsr     $26,do_signal
        bsr     $1,undo_switch_stack
        br      restore_all

That is, after do_signal we restore registers and return.

Unless you meant something else?


r~


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] Bug in signal handling
  2001-12-02 22:05 [Linux-ia64] Bug in signal handling Andreas Schwab
                   ` (2 preceding siblings ...)
  2001-12-05  1:14 ` Richard Henderson
@ 2001-12-05  2:15 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2001-12-05  2:15 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 4 Dec 2001 17:14:26 -0800, Richard Henderson <rth@redhat.com> said:

  Richard> On Mon, Dec 03, 2001 at 06:19:52PM -0800, David Mosberger
  Richard> wrote:
  >> Richard, I'm cc'ing you because it appears to me that Alpha Linux
  >> may have the same problem.

  Richard> I don't think it does:

  Richard> That is, after do_signal we restore registers and return.

  Richard> Unless you meant something else?

Oh, sorry, I was referring to teh *other* problem... ;-)

What I meant is that the check for re-scheduling
(current->need_resched) and signal deliverify (current->sigpending)
needs to be done with interrupts turned off, and the interrupts need
to be left off until user space is reached.  Otherwise, you could get
an interrupt which would wake up a higher priority task or post a
signal between the check and the return to user space.

I didn't see this interrupt disabling in the Alpha version of entry.S,
but I have to admit my Alpha assembly is getting quite rusty.

	--david


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-12-05  2:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-02 22:05 [Linux-ia64] Bug in signal handling Andreas Schwab
2001-12-03  3:15 ` David Mosberger
2001-12-04 19:53 ` David Mosberger
2001-12-05  1:14 ` Richard Henderson
2001-12-05  2:15 ` David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox