public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* avoid infinite loop in x86_64 interrupt return
@ 2005-05-04  5:01 Andrea Arcangeli
  2005-05-04  9:00 ` Rafael J. Wysocki
  2005-05-04 13:22 ` Andi Kleen
  0 siblings, 2 replies; 7+ messages in thread
From: Andrea Arcangeli @ 2005-05-04  5:01 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton; +Cc: linux-kernel

Hello,

A few minutes ago I've got an unkillable task in R state with vanilla
2.6.12-rc3 on x86_64, luckily system was still up with the other cpu (on
the desktop, so I had no kgdb environment set). Debugging revelaed rdi
corrupt when entering retint_signal (not set to $_TIF_WORK_MASK as
expected). This lead the rdx&rdi to return 0x20000 -> infinite loop.
Precisely rdi is set to ffff810030923f58 instead of $_TIF_WORK_MASK. So
it was the combination of ...2xxxx as rsp with TIF_IA32 in the task
flags. After noticing the rdi screwup the bug was quite clear: rdi was
set to pt_regs instead of $_TIF_WORK_MASK. Of course rsp is set to
ffff810030923f58 too (which also means do_notify_resume didn't clobber
rdi even if it could).

The below should fix the problem, I've no idea how to reproduce the
problem but it works on basic testing. The task looping was java (32bit,
that's where the 0x20000 come from), but it wasn't me starting java, it
must have been some random website (java was hanging around with 100%
system time for half an hour once I noticed it).

Signed-off-by: Andrea Arcangeli <andrea@suse.de>

--- 2.6.12-rc3/arch/x86_64/kernel/entry.S.orig	2005-05-04 06:47:02.000000000 +0200
+++ 2.6.12-rc3/arch/x86_64/kernel/entry.S	2005-05-04 06:50:34.000000000 +0200
@@ -489,6 +489,7 @@ retint_signal:
 	movq %rsp,%rdi		# &pt_regs
 	call do_notify_resume
 	RESTORE_REST
+	movl $_TIF_WORK_MASK,%edi
 	cli
 	GET_THREAD_INFO(%rcx)	
 	jmp retint_check

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid infinite loop in x86_64 interrupt return
  2005-05-04  5:01 avoid infinite loop in x86_64 interrupt return Andrea Arcangeli
@ 2005-05-04  9:00 ` Rafael J. Wysocki
  2005-05-04 13:31   ` Andrea Arcangeli
  2005-05-04 13:22 ` Andi Kleen
  1 sibling, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2005-05-04  9:00 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andi Kleen, Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]

Hi,

On Wednesday, 4 of May 2005 07:01, Andrea Arcangeli wrote:
> Hello,
> 
> A few minutes ago I've got an unkillable task in R state with vanilla
> 2.6.12-rc3 on x86_64, luckily system was still up with the other cpu (on
> the desktop, so I had no kgdb environment set). Debugging revelaed rdi
> corrupt when entering retint_signal (not set to $_TIF_WORK_MASK as
> expected). This lead the rdx&rdi to return 0x20000 -> infinite loop.
> Precisely rdi is set to ffff810030923f58 instead of $_TIF_WORK_MASK. So
> it was the combination of ...2xxxx as rsp with TIF_IA32 in the task
> flags. After noticing the rdi screwup the bug was quite clear: rdi was
> set to pt_regs instead of $_TIF_WORK_MASK. Of course rsp is set to
> ffff810030923f58 too (which also means do_notify_resume didn't clobber
> rdi even if it could).
> 
> The below should fix the problem, I've no idea how to reproduce the
> problem but it works on basic testing. The task looping was java (32bit,
> that's where the 0x20000 come from), but it wasn't me starting java, it
> must have been some random website (java was hanging around with 100%
> system time for half an hour once I noticed it).
> 
> Signed-off-by: Andrea Arcangeli <andrea@suse.de>
> 
> --- 2.6.12-rc3/arch/x86_64/kernel/entry.S.orig	2005-05-04 06:47:02.000000000 +0200
> +++ 2.6.12-rc3/arch/x86_64/kernel/entry.S	2005-05-04 06:50:34.000000000 +0200
> @@ -489,6 +489,7 @@ retint_signal:
>  	movq %rsp,%rdi		# &pt_regs
>  	call do_notify_resume
>  	RESTORE_REST
> +	movl $_TIF_WORK_MASK,%edi
>  	cli
>  	GET_THREAD_INFO(%rcx)	
>  	jmp retint_check
> -

You also need to add two missing clis.  Please have a look at the attached
patch from Andi.

Greets,
Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
		-- Lewis Carroll "Alice's Adventures in Wonderland"

[-- Attachment #2: 2.6.12-rc3-unkillable-java-ak.patch --]
[-- Type: text/x-diff, Size: 803 bytes --]

Signed-off-by: Andi Kleen <ak@suse.de>


diff -u linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o linux-2.6.12rc3/arch/x86_64/kernel/entry.S
--- linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o	2005-04-22 12:48:11.000000000 +0200
+++ linux-2.6.12rc3/arch/x86_64/kernel/entry.S	2005-04-27 15:52:49.305183345 +0200
@@ -296,6 +296,7 @@
 	call syscall_trace_leave
 	popq %rdi
 	andl $~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edi
+	cli	
 	jmp int_restore_rest
 	
 int_signal:
@@ -307,6 +308,7 @@
 1:	movl $_TIF_NEED_RESCHED,%edi	
 int_restore_rest:
 	RESTORE_REST
+	cli	
 	jmp int_with_check
 	CFI_ENDPROC
 		
@@ -490,7 +492,8 @@
 	call do_notify_resume
 	RESTORE_REST
 	cli
-	GET_THREAD_INFO(%rcx)	
+	GET_THREAD_INFO(%rcx)
+	movl $_TIF_WORK_MASK,%edi	
 	jmp retint_check
 
 #ifdef CONFIG_PREEMPT



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid infinite loop in x86_64 interrupt return
  2005-05-04  5:01 avoid infinite loop in x86_64 interrupt return Andrea Arcangeli
  2005-05-04  9:00 ` Rafael J. Wysocki
@ 2005-05-04 13:22 ` Andi Kleen
  2005-05-04 13:32   ` Andrea Arcangeli
  1 sibling, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2005-05-04 13:22 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andi Kleen, Andrew Morton, linux-kernel


It is already fixed in mainline. Actually I think it was a merging 
problem I had actually fixed it before the last merge in my tree, but 
some patches got lost :/

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid infinite loop in x86_64 interrupt return
  2005-05-04  9:00 ` Rafael J. Wysocki
@ 2005-05-04 13:31   ` Andrea Arcangeli
  2005-05-04 18:21     ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2005-05-04 13:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Andi Kleen, Andrew Morton, linux-kernel

On Wed, May 04, 2005 at 11:00:32AM +0200, Rafael J. Wysocki wrote:
> You also need to add two missing clis.  Please have a look at the attached
> patch from Andi.

Those two clis seems unrelated, so I don't see why I should mix them in
the same patch. I couldn't trigger the other problems, only the one with
rdi corruption.

Note that those clis seems superflous, cli is only needed before swapgs,
so adding cli before swapgs in retint_swapgs sounds a better idea than
to keep irq off for a longer time for no apparent good reason. But I've
no real idea why those cli are needed so I guess I must be missing
something. there's no commentary attached to your patch that can exlain
why the cli are needed _way_ before calling swapgs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid infinite loop in x86_64 interrupt return
  2005-05-04 13:22 ` Andi Kleen
@ 2005-05-04 13:32   ` Andrea Arcangeli
  0 siblings, 0 replies; 7+ messages in thread
From: Andrea Arcangeli @ 2005-05-04 13:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel

On Wed, May 04, 2005 at 03:22:15PM +0200, Andi Kleen wrote:
> 
> It is already fixed in mainline. Actually I think it was a merging 
> problem I had actually fixed it before the last merge in my tree, but 
> some patches got lost :/

Ok thanks. I'm still not in sync with git so I didn't notice, I looked
into mercurial instead of git infact.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid infinite loop in x86_64 interrupt return
  2005-05-04 13:31   ` Andrea Arcangeli
@ 2005-05-04 18:21     ` Andi Kleen
  2005-05-04 18:32       ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2005-05-04 18:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rafael J. Wysocki, Andi Kleen, Andrew Morton, linux-kernel

On Wed, May 04, 2005 at 03:31:29PM +0200, Andrea Arcangeli wrote:
> On Wed, May 04, 2005 at 11:00:32AM +0200, Rafael J. Wysocki wrote:
> > You also need to add two missing clis.  Please have a look at the attached
> > patch from Andi.
> 
> Those two clis seems unrelated, so I don't see why I should mix them in
> the same patch. I couldn't trigger the other problems, only the one with
> rdi corruption.

THere was a second patch which essentially got the line you posted
together with the missing clis.

I originally fixed it in a slightly different way (in the version
that got lost), but this one was equivalent.

> 
> Note that those clis seems superflous, cli is only needed before swapgs,
> so adding cli before swapgs in retint_swapgs sounds a better idea than
> to keep irq off for a longer time for no apparent good reason. But I've
> no real idea why those cli are needed so I guess I must be missing
> something. there's no commentary attached to your patch that can exlain
> why the cli are needed _way_ before calling swapgs.

To avoid losing schedule events and signals. Between checking for them
and returning to user space interrupts need to be off. When they are
reenabled everything needs to be rechecked.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid infinite loop in x86_64 interrupt return
  2005-05-04 18:21     ` Andi Kleen
@ 2005-05-04 18:32       ` Andrea Arcangeli
  0 siblings, 0 replies; 7+ messages in thread
From: Andrea Arcangeli @ 2005-05-04 18:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Rafael J. Wysocki, Andrew Morton, linux-kernel

On Wed, May 04, 2005 at 08:21:43PM +0200, Andi Kleen wrote:
> To avoid losing schedule events and signals. Between checking for them
> and returning to user space interrupts need to be off. When they are
> reenabled everything needs to be rechecked.

All right good point, the latency issues, they'd wait the timeslice to
expire instead of being processed immediatly. That couldn't be noticed
without some measurement but it's sure worth fixing agreed, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-05-04 18:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-04  5:01 avoid infinite loop in x86_64 interrupt return Andrea Arcangeli
2005-05-04  9:00 ` Rafael J. Wysocki
2005-05-04 13:31   ` Andrea Arcangeli
2005-05-04 18:21     ` Andi Kleen
2005-05-04 18:32       ` Andrea Arcangeli
2005-05-04 13:22 ` Andi Kleen
2005-05-04 13:32   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox