* avoid infinite loop in x86_64 interrupt return @ 2005-05-04 5:01 Andrea Arcangeli 2005-05-04 9:00 ` Rafael J. Wysocki 2005-05-04 13:22 ` Andi Kleen 0 siblings, 2 replies; 7+ messages in thread From: Andrea Arcangeli @ 2005-05-04 5:01 UTC (permalink / raw) To: Andi Kleen, Andrew Morton; +Cc: linux-kernel Hello, A few minutes ago I've got an unkillable task in R state with vanilla 2.6.12-rc3 on x86_64, luckily system was still up with the other cpu (on the desktop, so I had no kgdb environment set). Debugging revelaed rdi corrupt when entering retint_signal (not set to $_TIF_WORK_MASK as expected). This lead the rdx&rdi to return 0x20000 -> infinite loop. Precisely rdi is set to ffff810030923f58 instead of $_TIF_WORK_MASK. So it was the combination of ...2xxxx as rsp with TIF_IA32 in the task flags. After noticing the rdi screwup the bug was quite clear: rdi was set to pt_regs instead of $_TIF_WORK_MASK. Of course rsp is set to ffff810030923f58 too (which also means do_notify_resume didn't clobber rdi even if it could). The below should fix the problem, I've no idea how to reproduce the problem but it works on basic testing. The task looping was java (32bit, that's where the 0x20000 come from), but it wasn't me starting java, it must have been some random website (java was hanging around with 100% system time for half an hour once I noticed it). Signed-off-by: Andrea Arcangeli <andrea@suse.de> --- 2.6.12-rc3/arch/x86_64/kernel/entry.S.orig 2005-05-04 06:47:02.000000000 +0200 +++ 2.6.12-rc3/arch/x86_64/kernel/entry.S 2005-05-04 06:50:34.000000000 +0200 @@ -489,6 +489,7 @@ retint_signal: movq %rsp,%rdi # &pt_regs call do_notify_resume RESTORE_REST + movl $_TIF_WORK_MASK,%edi cli GET_THREAD_INFO(%rcx) jmp retint_check ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: avoid infinite loop in x86_64 interrupt return 2005-05-04 5:01 avoid infinite loop in x86_64 interrupt return Andrea Arcangeli @ 2005-05-04 9:00 ` Rafael J. Wysocki 2005-05-04 13:31 ` Andrea Arcangeli 2005-05-04 13:22 ` Andi Kleen 1 sibling, 1 reply; 7+ messages in thread From: Rafael J. Wysocki @ 2005-05-04 9:00 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andi Kleen, Andrew Morton, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1877 bytes --] Hi, On Wednesday, 4 of May 2005 07:01, Andrea Arcangeli wrote: > Hello, > > A few minutes ago I've got an unkillable task in R state with vanilla > 2.6.12-rc3 on x86_64, luckily system was still up with the other cpu (on > the desktop, so I had no kgdb environment set). Debugging revelaed rdi > corrupt when entering retint_signal (not set to $_TIF_WORK_MASK as > expected). This lead the rdx&rdi to return 0x20000 -> infinite loop. > Precisely rdi is set to ffff810030923f58 instead of $_TIF_WORK_MASK. So > it was the combination of ...2xxxx as rsp with TIF_IA32 in the task > flags. After noticing the rdi screwup the bug was quite clear: rdi was > set to pt_regs instead of $_TIF_WORK_MASK. Of course rsp is set to > ffff810030923f58 too (which also means do_notify_resume didn't clobber > rdi even if it could). > > The below should fix the problem, I've no idea how to reproduce the > problem but it works on basic testing. The task looping was java (32bit, > that's where the 0x20000 come from), but it wasn't me starting java, it > must have been some random website (java was hanging around with 100% > system time for half an hour once I noticed it). > > Signed-off-by: Andrea Arcangeli <andrea@suse.de> > > --- 2.6.12-rc3/arch/x86_64/kernel/entry.S.orig 2005-05-04 06:47:02.000000000 +0200 > +++ 2.6.12-rc3/arch/x86_64/kernel/entry.S 2005-05-04 06:50:34.000000000 +0200 > @@ -489,6 +489,7 @@ retint_signal: > movq %rsp,%rdi # &pt_regs > call do_notify_resume > RESTORE_REST > + movl $_TIF_WORK_MASK,%edi > cli > GET_THREAD_INFO(%rcx) > jmp retint_check > - You also need to add two missing clis. Please have a look at the attached patch from Andi. Greets, Rafael -- - Would you tell me, please, which way I ought to go from here? - That depends a good deal on where you want to get to. -- Lewis Carroll "Alice's Adventures in Wonderland" [-- Attachment #2: 2.6.12-rc3-unkillable-java-ak.patch --] [-- Type: text/x-diff, Size: 803 bytes --] Signed-off-by: Andi Kleen <ak@suse.de> diff -u linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o linux-2.6.12rc3/arch/x86_64/kernel/entry.S --- linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o 2005-04-22 12:48:11.000000000 +0200 +++ linux-2.6.12rc3/arch/x86_64/kernel/entry.S 2005-04-27 15:52:49.305183345 +0200 @@ -296,6 +296,7 @@ call syscall_trace_leave popq %rdi andl $~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edi + cli jmp int_restore_rest int_signal: @@ -307,6 +308,7 @@ 1: movl $_TIF_NEED_RESCHED,%edi int_restore_rest: RESTORE_REST + cli jmp int_with_check CFI_ENDPROC @@ -490,7 +492,8 @@ call do_notify_resume RESTORE_REST cli - GET_THREAD_INFO(%rcx) + GET_THREAD_INFO(%rcx) + movl $_TIF_WORK_MASK,%edi jmp retint_check #ifdef CONFIG_PREEMPT ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: avoid infinite loop in x86_64 interrupt return 2005-05-04 9:00 ` Rafael J. Wysocki @ 2005-05-04 13:31 ` Andrea Arcangeli 2005-05-04 18:21 ` Andi Kleen 0 siblings, 1 reply; 7+ messages in thread From: Andrea Arcangeli @ 2005-05-04 13:31 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Andi Kleen, Andrew Morton, linux-kernel On Wed, May 04, 2005 at 11:00:32AM +0200, Rafael J. Wysocki wrote: > You also need to add two missing clis. Please have a look at the attached > patch from Andi. Those two clis seems unrelated, so I don't see why I should mix them in the same patch. I couldn't trigger the other problems, only the one with rdi corruption. Note that those clis seems superflous, cli is only needed before swapgs, so adding cli before swapgs in retint_swapgs sounds a better idea than to keep irq off for a longer time for no apparent good reason. But I've no real idea why those cli are needed so I guess I must be missing something. there's no commentary attached to your patch that can exlain why the cli are needed _way_ before calling swapgs. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: avoid infinite loop in x86_64 interrupt return 2005-05-04 13:31 ` Andrea Arcangeli @ 2005-05-04 18:21 ` Andi Kleen 2005-05-04 18:32 ` Andrea Arcangeli 0 siblings, 1 reply; 7+ messages in thread From: Andi Kleen @ 2005-05-04 18:21 UTC (permalink / raw) To: Andrea Arcangeli Cc: Rafael J. Wysocki, Andi Kleen, Andrew Morton, linux-kernel On Wed, May 04, 2005 at 03:31:29PM +0200, Andrea Arcangeli wrote: > On Wed, May 04, 2005 at 11:00:32AM +0200, Rafael J. Wysocki wrote: > > You also need to add two missing clis. Please have a look at the attached > > patch from Andi. > > Those two clis seems unrelated, so I don't see why I should mix them in > the same patch. I couldn't trigger the other problems, only the one with > rdi corruption. THere was a second patch which essentially got the line you posted together with the missing clis. I originally fixed it in a slightly different way (in the version that got lost), but this one was equivalent. > > Note that those clis seems superflous, cli is only needed before swapgs, > so adding cli before swapgs in retint_swapgs sounds a better idea than > to keep irq off for a longer time for no apparent good reason. But I've > no real idea why those cli are needed so I guess I must be missing > something. there's no commentary attached to your patch that can exlain > why the cli are needed _way_ before calling swapgs. To avoid losing schedule events and signals. Between checking for them and returning to user space interrupts need to be off. When they are reenabled everything needs to be rechecked. -Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: avoid infinite loop in x86_64 interrupt return 2005-05-04 18:21 ` Andi Kleen @ 2005-05-04 18:32 ` Andrea Arcangeli 0 siblings, 0 replies; 7+ messages in thread From: Andrea Arcangeli @ 2005-05-04 18:32 UTC (permalink / raw) To: Andi Kleen; +Cc: Rafael J. Wysocki, Andrew Morton, linux-kernel On Wed, May 04, 2005 at 08:21:43PM +0200, Andi Kleen wrote: > To avoid losing schedule events and signals. Between checking for them > and returning to user space interrupts need to be off. When they are > reenabled everything needs to be rechecked. All right good point, the latency issues, they'd wait the timeslice to expire instead of being processed immediatly. That couldn't be noticed without some measurement but it's sure worth fixing agreed, thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: avoid infinite loop in x86_64 interrupt return 2005-05-04 5:01 avoid infinite loop in x86_64 interrupt return Andrea Arcangeli 2005-05-04 9:00 ` Rafael J. Wysocki @ 2005-05-04 13:22 ` Andi Kleen 2005-05-04 13:32 ` Andrea Arcangeli 1 sibling, 1 reply; 7+ messages in thread From: Andi Kleen @ 2005-05-04 13:22 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andi Kleen, Andrew Morton, linux-kernel It is already fixed in mainline. Actually I think it was a merging problem I had actually fixed it before the last merge in my tree, but some patches got lost :/ -Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: avoid infinite loop in x86_64 interrupt return 2005-05-04 13:22 ` Andi Kleen @ 2005-05-04 13:32 ` Andrea Arcangeli 0 siblings, 0 replies; 7+ messages in thread From: Andrea Arcangeli @ 2005-05-04 13:32 UTC (permalink / raw) To: Andi Kleen; +Cc: Andrew Morton, linux-kernel On Wed, May 04, 2005 at 03:22:15PM +0200, Andi Kleen wrote: > > It is already fixed in mainline. Actually I think it was a merging > problem I had actually fixed it before the last merge in my tree, but > some patches got lost :/ Ok thanks. I'm still not in sync with git so I didn't notice, I looked into mercurial instead of git infact. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-05-04 18:34 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-04 5:01 avoid infinite loop in x86_64 interrupt return Andrea Arcangeli 2005-05-04 9:00 ` Rafael J. Wysocki 2005-05-04 13:31 ` Andrea Arcangeli 2005-05-04 18:21 ` Andi Kleen 2005-05-04 18:32 ` Andrea Arcangeli 2005-05-04 13:22 ` Andi Kleen 2005-05-04 13:32 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox