From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Karel Zak <kzak@redhat.com>,
Arjan van de Ven <arjan@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [kerneloops] regression in 2.6.27 wrt "lock_page" and the "hwclock" program
Date: Mon, 13 Oct 2008 17:26:33 +0200 [thread overview]
Message-ID: <20081013152633.GA6523@elte.hu> (raw)
In-Reply-To: <alpine.LFD.2.00.0810121308210.3402@nehalem.linux-foundation.org>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> [ Ingo added to Cc just because this is obviously a x86 tree thing, and
> tries to unify some trivial parts of the VM paths at the same time. ]
applied to tip/x86/urgent, thanks Linus!
do you agree with the changelog and can i add your Signed-off-by ?
Ingo
--------------------->
>From bdbe15671f9b3ad1264ed174f62563774f0abef9 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 12 Oct 2008 13:16:12 -0700
Subject: [PATCH] x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
Arjan reported a spike in the following bug pattern in v2.6.27:
http://www.kerneloops.org/searchweek.php?search=lock_page
which happens because hwclock started triggering warnings due to
a (correct) might_sleep() check in the MM code.
The warning occurs because hwclock uses this dubious sequence of
code to run "atomic" code:
static unsigned long
atomic(const char *name, unsigned long (*op)(unsigned long),
unsigned long arg)
{
unsigned long v;
__asm__ volatile ("cli");
v = (*op)(arg);
__asm__ volatile ("sti");
return v;
}
Then it pagefaults in that "atomic" section, triggering the warning.
There is no way the kernel could provide "atomicity" in this path,
a page fault is a cannot-continue machine event so the kernel has to
wait for the page to be filled in.
Even if it was just a minor fault we'd have to take locks and might have
to spend quite a bit of time with interrupts disabled - not nice to irq
latencies in general.
So instead just enable interrupts in the pagefault path unconditionally
if we come from user-space, and handle the fault.
Also, while touching this code, unify some trivial parts of the x86
VM paths at the same time.
Reported-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mm/fault.c | 30 +++++++++++-------------------
1 files changed, 11 insertions(+), 19 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a742d75..ac2ad78 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -645,24 +645,23 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
}
-#ifdef CONFIG_X86_32
- /* It's safe to allow irq's after cr2 has been saved and the vmalloc
- fault has been handled. */
- if (regs->flags & (X86_EFLAGS_IF | X86_VM_MASK))
- local_irq_enable();
-
/*
- * If we're in an interrupt, have no user context or are running in an
- * atomic region then we must not take the fault.
+ * It's safe to allow irq's after cr2 has been saved and the
+ * vmalloc fault has been handled.
+ *
+ * User-mode registers count as a user access even for any
+ * potential system fault or CPU buglet.
*/
- if (in_atomic() || !mm)
- goto bad_area_nosemaphore;
-#else /* CONFIG_X86_64 */
- if (likely(regs->flags & X86_EFLAGS_IF))
+ if (user_mode_vm(regs)) {
+ local_irq_enable();
+ error_code |= PF_USER;
+ } else if (regs->flags & X86_EFLAGS_IF)
local_irq_enable();
+#ifdef CONFIG_X86_64
if (unlikely(error_code & PF_RSVD))
pgtable_bad(address, regs, error_code);
+#endif
/*
* If we're in an interrupt, have no user context or are running in an
@@ -671,14 +670,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
if (unlikely(in_atomic() || !mm))
goto bad_area_nosemaphore;
- /*
- * User-mode registers count as a user access even for any
- * potential system fault or CPU buglet.
- */
- if (user_mode_vm(regs))
- error_code |= PF_USER;
again:
-#endif
/* When running in the kernel we expect faults to occur only to
* addresses in user space. All other faults represent errors in the
* kernel and should generate an OOPS. Unfortunately, in the case of an
next prev parent reply other threads:[~2008-10-13 15:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-05 0:44 [kerneloops] regression in 2.6.27 wrt "lock_page" and the "hwclock" program Arjan van de Ven
2008-10-05 4:52 ` Andrew Morton
2008-10-05 15:11 ` Arjan van de Ven
2008-10-05 17:27 ` Andrew Morton
2008-10-05 17:38 ` Arjan van de Ven
2008-10-12 20:00 ` Karel Zak
2008-10-12 20:16 ` Linus Torvalds
2008-10-13 14:55 ` Arjan van de Ven
2008-10-14 21:04 ` Karel Zak
2008-10-13 15:26 ` Ingo Molnar [this message]
2008-10-13 15:40 ` Alan Cox
2008-10-13 15:44 ` Linus Torvalds
2008-10-13 16:02 ` Ingo Molnar
2008-10-13 16:08 ` Linus Torvalds
2008-10-13 16:21 ` Ingo Molnar
2008-10-12 19:46 ` Karel Zak
2008-10-05 18:18 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081013152633.GA6523@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=hpa@zytor.com \
--cc=kzak@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.