linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Stanislav Meduna <stano@meduna.org>,
	"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Linus Torvalds <torvalds@linux-foundation.org>,
	Hai Huang <hhuang@redhat.com>
Subject: Re: [PATCH - sort of] x86: Livelock in handle_pte_fault
Date: Wed, 22 May 2013 08:33:52 -0400	[thread overview]
Message-ID: <519CBB30.3060200@redhat.com> (raw)
In-Reply-To: <1369183168.6828.168.camel@gandalf.local.home>

[-- Attachment #1: Type: text/plain, Size: 2854 bytes --]

On 05/21/2013 08:39 PM, Steven Rostedt wrote:
> On Fri, 2013-05-17 at 10:42 +0200, Stanislav Meduna wrote:
>> Hi all,
>>
>> I don't know whether this is linux-rt specific or applies to
>> the mainline too, so I'll repeat some things the linux-rt
>> readers already know.
>>
>> Environment:
>>
>> - Geode LX or Celeron M
>> - _not_ CONFIG_SMP
>> - linux 3.4 with realtime patches and full preempt configured
>> - an application consisting of several mostly RR-class threads
>
> The threads do a mlockall too right? I'm not sure mlock will lock memory
> for a new thread's stack.
>
>> - the application runs with mlockall()
>
> With both MCL_FUTURE and MCL_CURRENT set, right?
>
>> - there is no swap
>
> Hmm, doesn't mean that code can't be swapped out, as it is just mapped
> from the file it came from. But you'd think mlockall would prevent that.
>
>>
>> Problem:
>>
>> - after several hours to 1-2 weeks some of the threads start to loop
>>    in the following way
>>
>>    0d...0 62811.755382: function:  do_page_fault
>>    0....0 62811.755386: function:     handle_mm_fault
>>    0....0 62811.755389: function:        handle_pte_fault
>>    0d...0 62811.755394: function:  do_page_fault
>>    0....0 62811.755396: function:     handle_mm_fault
>>    0....0 62811.755398: function:        handle_pte_fault
>>    0d...0 62811.755402: function:  do_page_fault
>>    0....0 62811.755404: function:     handle_mm_fault
>>    0....0 62811.755406: function:        handle_pte_fault
>>
>>    and stay in the loop until the RT throttling gets activated.
>>    One of the faulting addresses was in code (after returning
>>    from a syscall), a second one in stack (inside put_user right
>>    before a syscall ends), both were surely mapped.
>>
>> - After RT throttler activates it somehow magically fixes itself,
>>    probably (not verified) because another _process_ gets scheduled.
>>    When throttled the RR and FF threads are not allowed to run for
>>    a while (20 ms in my configuration). The livelocks lasts around
>>    1-3 seconds, and there is a SCHED_OTHER process that runs each
>>    2 seconds.
>
> Hmm, if there was a missed TLB flush, and we are faulting due to a bad
> TLB table, and it goes into an infinite faulting loop, the only thing
> that will stop it is the RT throttle. Then a new task gets scheduled,
> and we flush the TLB and everything is fine again.

That sounds like maybe we DO want a TLB flush on spurious
page faults, so we get rid of this problem.

Last fall we thought this problem could not happen on x86,
but your bug report suggests that it might.

We can get flush_tlb_fix_spurious_fault to do a local TLB
invalidate of just the address in question by removing the
x86-specific dummy version, falling back to the asm-generic
version that does something.

Can you test the attached patch?

-- 
All rights reversed

[-- Attachment #2: flush-tlb-on-spurious-fault.patch --]
[-- Type: text/x-patch, Size: 1003 bytes --]

Subject: x86,mm: flush TLB on spurious fault

It appears that certain x86 CPUs do not automatically flush the
TLB entry that caused a page fault, causing spurious faults to
loop forever under certain circumstances.

Remove the dummy flush_tlb_fix_spurious_fault define, so x86
falls back to the asm-generic version, which does do a local
TLB flush.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Stanislav Meduna <stano@meduna.org>
---
 arch/x86/include/asm/pgtable.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1e67223..43e7966 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -729,8 +729,6 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
 	pte_update(mm, addr, ptep);
 }
 
-#define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
-
 #define mk_pmd(page, pgprot)   pfn_pmd(page_to_pfn(page), (pgprot))
 
 #define  __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS

  parent reply	other threads:[~2013-05-22 12:34 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-17  8:42 [PATCH - sort of] x86: Livelock in handle_pte_fault Stanislav Meduna
2013-05-22  0:39 ` Steven Rostedt
2013-05-22  7:32   ` Stanislav Meduna
2013-05-22 12:33   ` Rik van Riel [this message]
2013-05-22 15:01     ` Linus Torvalds
2013-05-22 17:41       ` [PATCH] mm: fix up a spurious page fault whenever it happens Rik van Riel
2013-05-22 18:04         ` Stanislav Meduna
2013-05-22 18:11           ` Steven Rostedt
2013-05-22 18:21             ` Stanislav Meduna
2013-05-22 18:35               ` Rik van Riel
2013-05-22 18:42                 ` H. Peter Anvin
2013-05-22 18:43                   ` Rik van Riel
2013-05-23  8:07                     ` Stanislav Meduna
2013-05-23 12:19                       ` Rik van Riel
2013-05-23 13:29                         ` Steven Rostedt
2013-05-23 15:06                           ` H. Peter Anvin
2013-05-23 15:27                             ` Steven Rostedt
2013-05-23 17:24                               ` H. Peter Anvin
2013-05-23 17:36                                 ` Steven Rostedt
2013-05-23 17:38                                   ` H. Peter Anvin
2013-05-24  8:29                         ` Stanislav Meduna
2013-05-24 10:28                           ` Stanislav Meduna
2013-05-24 13:06                           ` Rik van Riel
2013-05-24 13:55                             ` Stanislav Meduna
2013-05-24 14:23                               ` Stanislav Meduna
2013-06-16 21:34                             ` Stanislav Meduna
2013-06-18 19:13                               ` Stanislav Meduna
2013-06-19  5:20                                 ` Linus Torvalds
2013-06-19  7:36                                   ` Stanislav Meduna
2013-06-19  8:06                                     ` Peter Zijlstra
2013-06-20 17:50                                       ` Stanislav Meduna
2013-05-23 14:45                       ` Linus Torvalds
2013-05-23 14:50                         ` Linus Torvalds
2013-05-23 15:03                           ` Stanislav Meduna
2013-05-22 18:47                 ` Stanislav Meduna

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519CBB30.3060200@redhat.com \
    --to=riel@redhat.com \
    --cc=hhuang@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=stano@meduna.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).