All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Karel Zak <kzak@redhat.com>,
	Arjan van de Ven <arjan@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [kerneloops] regression in 2.6.27 wrt "lock_page" and the "hwclock" program
Date: Mon, 13 Oct 2008 17:26:33 +0200	[thread overview]
Message-ID: <20081013152633.GA6523@elte.hu> (raw)
In-Reply-To: <alpine.LFD.2.00.0810121308210.3402@nehalem.linux-foundation.org>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> [ Ingo added to Cc just because this is obviously a x86 tree thing, and 
>   tries to unify some trivial parts of the VM paths at the same time. ]

applied to tip/x86/urgent, thanks Linus!

do you agree with the changelog and can i add your Signed-off-by ?

	Ingo

--------------------->
>From bdbe15671f9b3ad1264ed174f62563774f0abef9 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 12 Oct 2008 13:16:12 -0700
Subject: [PATCH] x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault

Arjan reported a spike in the following bug pattern in v2.6.27:

   http://www.kerneloops.org/searchweek.php?search=lock_page

which happens because hwclock started triggering warnings due to
a (correct) might_sleep() check in the MM code.

The warning occurs because hwclock uses this dubious sequence of
code to run "atomic" code:

  static unsigned long
  atomic(const char *name, unsigned long (*op)(unsigned long),
         unsigned long arg)
  {
    unsigned long v;
    __asm__ volatile ("cli");
    v = (*op)(arg);
    __asm__ volatile ("sti");
    return v;
  }

Then it pagefaults in that "atomic" section, triggering the warning.

There is no way the kernel could provide "atomicity" in this path,
a page fault is a cannot-continue machine event so the kernel has to
wait for the page to be filled in.

Even if it was just a minor fault we'd have to take locks and might have
to spend quite a bit of time with interrupts disabled - not nice to irq
latencies in general.

So instead just enable interrupts in the pagefault path unconditionally
if we come from user-space, and handle the fault.

Also, while touching this code, unify some trivial parts of the x86
VM paths at the same time.

Reported-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/mm/fault.c |   30 +++++++++++-------------------
 1 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a742d75..ac2ad78 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -645,24 +645,23 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	}
 
 
-#ifdef CONFIG_X86_32
-	/* It's safe to allow irq's after cr2 has been saved and the vmalloc
-	   fault has been handled. */
-	if (regs->flags & (X86_EFLAGS_IF | X86_VM_MASK))
-		local_irq_enable();
-
 	/*
-	 * If we're in an interrupt, have no user context or are running in an
-	 * atomic region then we must not take the fault.
+	 * It's safe to allow irq's after cr2 has been saved and the
+	 * vmalloc fault has been handled.
+	 *
+	 * User-mode registers count as a user access even for any
+	 * potential system fault or CPU buglet.
 	 */
-	if (in_atomic() || !mm)
-		goto bad_area_nosemaphore;
-#else /* CONFIG_X86_64 */
-	if (likely(regs->flags & X86_EFLAGS_IF))
+	if (user_mode_vm(regs)) {
+		local_irq_enable();
+		error_code |= PF_USER;
+	} else if (regs->flags & X86_EFLAGS_IF)
 		local_irq_enable();
 
+#ifdef CONFIG_X86_64
 	if (unlikely(error_code & PF_RSVD))
 		pgtable_bad(address, regs, error_code);
+#endif
 
 	/*
 	 * If we're in an interrupt, have no user context or are running in an
@@ -671,14 +670,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	if (unlikely(in_atomic() || !mm))
 		goto bad_area_nosemaphore;
 
-	/*
-	 * User-mode registers count as a user access even for any
-	 * potential system fault or CPU buglet.
-	 */
-	if (user_mode_vm(regs))
-		error_code |= PF_USER;
 again:
-#endif
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
 	 * kernel and should generate an OOPS.  Unfortunately, in the case of an

  parent reply	other threads:[~2008-10-13 15:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-05  0:44 [kerneloops] regression in 2.6.27 wrt "lock_page" and the "hwclock" program Arjan van de Ven
2008-10-05  4:52 ` Andrew Morton
2008-10-05 15:11   ` Arjan van de Ven
2008-10-05 17:27     ` Andrew Morton
2008-10-05 17:38       ` Arjan van de Ven
2008-10-12 20:00         ` Karel Zak
2008-10-12 20:16           ` Linus Torvalds
2008-10-13 14:55             ` Arjan van de Ven
2008-10-14 21:04               ` Karel Zak
2008-10-13 15:26             ` Ingo Molnar [this message]
2008-10-13 15:40               ` Alan Cox
2008-10-13 15:44               ` Linus Torvalds
2008-10-13 16:02                 ` Ingo Molnar
2008-10-13 16:08                   ` Linus Torvalds
2008-10-13 16:21                     ` Ingo Molnar
2008-10-12 19:46       ` Karel Zak
2008-10-05 18:18   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081013152633.GA6523@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=hpa@zytor.com \
    --cc=kzak@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.