public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	akpm@osdl.org, "H. Peter Anvin" <hpa@zytor.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Frank Ch. Eigler" <fche@redhat.com>
Subject: [patch 02/37] x86_64 page fault NMI-safe
Date: Thu, 24 Apr 2008 11:03:26 -0400	[thread overview]
Message-ID: <20080424151352.660545603@polymtl.ca> (raw)
In-Reply-To: 20080424150324.802695381@polymtl.ca

[-- Attachment #1: x86_64-page-fault-nmi-safe.patch --]
[-- Type: text/plain, Size: 3462 bytes --]

> I think you're vastly overestimating what is sane to do from an NMI
> context.  It is utterly and totally insane to assume vmalloc is available
> in NMI.
>
>       -hpa
>

Ok, please tell me where I am wrong then.. by looking into
arch/x86/mm/fault.c, I see that vmalloc_sync_all() touches pgd_list
entries while the pgd_lock spinlock is taken, with interrupts disabled.
So it's protected against concurrent pgd_list modification from

a - vmalloc_sync_all() on other CPUs
b - local interrupts

However, a completely normal interrupt can come on a remote CPU, run
vmalloc_fault() and issue a set_pgd concurrently. Therefore I conclude
this interrupt disable is not there to insure any kind of protection
against concurrent updates.

Also, we see that vmalloc_fault has comments such as :

(for x86_32)
         * Do _not_ use "current" here. We might be inside
         * an interrupt in the middle of a task switch..

So it takes the pgd_addr from cr3, not from current. Using only the
stack/registers makes this NMI-safe even if "current" is invalid when
the NMI comes. This is caused by the fact that __switch_to will update
the registers before updating current_task without disabling interrupts.

You are right in that x86_64 does not seems to play as safely as x86_32
on this matter; it uses current->mm. Probably it shouldn't assume
"current" is valid. Actually, I don't see where x86_64 disables
interrupts around __switch_to, so this would seem to be a race
condition. Or have I missed something ?

(Ingo)
> > the scheduler disables interrupts around __switch_to(). (x86 does 
> > not set __ARCH_WANT_INTERRUPTS_ON_CTXSW)
>
(Mathieu)
> Ok, so I guess it's only useful to NMIs then. However, it makes me
> wonder why this comment was there in the first place on x86_32
> vmalloc_fault() and why it uses read_cr3() :
>
>         * Do _not_ use "current" here. We might be inside
>         * an interrupt in the middle of a task switch..
(Ingo)
hm, i guess it's still useful to keep the
__ARCH_WANT_INTERRUPTS_ON_CTXSW case working too. On -rt we used to
enable it to squeeze a tiny bit more latency out of the system.


Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: akpm@osdl.org
CC: mingo@elte.hu
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
---
 arch/x86/mm/fault.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-sched-devel/arch/x86/mm/fault.c
===================================================================
--- linux-2.6-sched-devel.orig/arch/x86/mm/fault.c	2008-04-22 20:04:02.000000000 -0400
+++ linux-2.6-sched-devel/arch/x86/mm/fault.c	2008-04-22 20:09:50.000000000 -0400
@@ -535,6 +535,7 @@ static int vmalloc_fault(struct pt_regs 
 	}
 	return 0;
 #else
+	unsigned long pgd_paddr;
 	pgd_t *pgd, *pgd_ref;
 	pud_t *pud, *pud_ref;
 	pmd_t *pmd, *pmd_ref;
@@ -548,7 +549,8 @@ static int vmalloc_fault(struct pt_regs 
 	   happen within a race in page table update. In the later
 	   case just flush. */
 
-	pgd = pgd_offset(current->mm ?: &init_mm, address);
+	pgd_paddr = read_cr3();
+	pgd = __va(pgd_paddr) + pgd_index(address);
 	pgd_ref = pgd_offset_k(address);
 	if (pgd_none(*pgd_ref))
 		return -1;

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  parent reply	other threads:[~2008-04-24 15:20 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-24 15:03 [patch 00/37] Linux Kernel Markers instrumentation for sched-devel.git Mathieu Desnoyers
2008-04-24 15:03 ` [patch 01/37] Stringify support commas Mathieu Desnoyers
2008-04-24 15:03 ` Mathieu Desnoyers [this message]
2008-04-24 15:03 ` [patch 03/37] Change Alpha active count bit Mathieu Desnoyers
2008-04-24 15:03 ` [patch 04/37] Change avr32 " Mathieu Desnoyers
2008-04-24 15:03 ` [patch 05/37] x86 NMI-safe INT3 and Page Fault Mathieu Desnoyers
2008-04-24 15:03 ` [patch 06/37] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
2008-04-24 15:03 ` [patch 07/37] Kprobes - do not use kprobes mutex in arch code Mathieu Desnoyers
2008-04-24 15:03 ` [patch 08/37] Kprobes - declare kprobe_mutex static Mathieu Desnoyers
2008-04-24 15:03 ` [patch 09/37] Fix sched-devel text_poke Mathieu Desnoyers
2008-04-24 15:03 ` [patch 10/37] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2008-04-24 15:03 ` [patch 11/37] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 12/37] Add all cpus option to stop machine run Mathieu Desnoyers
2008-04-24 15:03 ` [patch 13/37] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2008-04-25 14:55   ` Ingo Molnar
2008-04-26  9:36     ` Ingo Molnar
2008-04-26 11:09       ` Ingo Molnar
2008-04-26 14:17         ` Mathieu Desnoyers
2008-04-24 15:03 ` [patch 14/37] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2008-04-24 15:03 ` [patch 15/37] Immediate Values - x86 Optimization Mathieu Desnoyers
2008-04-24 15:03 ` [patch 16/37] Add text_poke and sync_core to powerpc Mathieu Desnoyers
2008-04-24 15:03 ` [patch 17/37] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2008-04-24 15:03 ` [patch 18/37] Immediate Values - Documentation Mathieu Desnoyers
2008-04-24 15:03 ` [patch 19/37] Immediate Values Support init Mathieu Desnoyers
2008-04-24 15:03 ` [patch 20/37] Immediate Values - Move Kprobes x86 restore_interrupt to kdebug.h Mathieu Desnoyers
2008-04-24 15:03 ` [patch 21/37] Add __discard section to x86 Mathieu Desnoyers
2008-04-24 15:03 ` [patch 22/37] Immediate Values - x86 Optimization NMI and MCE support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 23/37] Immediate Values - Powerpc Optimization NMI " Mathieu Desnoyers
2008-04-24 15:03 ` [patch 24/37] Immediate Values Use Arch NMI and MCE Support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 25/37] Immediate Values - Jump Mathieu Desnoyers
2008-04-24 15:03 ` [patch 26/37] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
2008-04-24 15:03 ` [patch 27/37] From: Adrian Bunk <bunk@kernel.org> Mathieu Desnoyers
2008-04-28  9:54   ` Adrian Bunk
2008-04-28 12:37     ` Mathieu Desnoyers
2008-04-24 15:03 ` [patch 28/37] Markers - remove extra format argument Mathieu Desnoyers
2008-04-24 15:03 ` [patch 29/37] Markers - define non optimized marker Mathieu Desnoyers
2008-04-24 15:03 ` [patch 30/37] Linux Kernel Markers - Use Immediate Values Mathieu Desnoyers
2008-04-24 15:03 ` [patch 31/37] Markers use imv jump Mathieu Desnoyers
2008-04-24 15:03 ` [patch 32/37] Port ftrace to markers Mathieu Desnoyers
2008-04-24 15:03 ` [patch 33/37] LTTng instrumentation fs Mathieu Desnoyers
2008-04-24 15:03 ` [patch 34/37] LTTng instrumentation ipc Mathieu Desnoyers
2008-04-24 23:02   ` Alexey Dobriyan
2008-04-25  2:15     ` Frank Ch. Eigler
2008-04-25 12:56       ` Mathieu Desnoyers
2008-04-25 13:17         ` [RFC] system-wide in-kernel syscall tracing Mathieu Desnoyers
2008-05-04 21:04         ` [patch 34/37] LTTng instrumentation ipc Alexey Dobriyan
2008-05-04 20:39           ` Frank Ch. Eigler
2008-04-24 15:03 ` [patch 35/37] LTTng instrumentation kernel Mathieu Desnoyers
2008-04-24 15:04 ` [patch 36/37] LTTng instrumentation mm Mathieu Desnoyers
2008-04-28  2:12   ` Masami Hiramatsu
2008-04-24 15:04 ` [patch 37/37] LTTng instrumentation net Mathieu Desnoyers
2008-04-24 15:52   ` Pavel Emelyanov
2008-04-24 16:13     ` Mathieu Desnoyers
2008-04-24 16:30       ` Pavel Emelyanov
2008-04-26 19:38 ` [patch 00/37] Linux Kernel Markers instrumentation for sched-devel.git Peter Zijlstra
2008-04-26 20:11   ` Mathieu Desnoyers
2008-04-27 10:00     ` Peter Zijlstra
2008-05-04 15:08       ` Mathieu Desnoyers
2008-04-28 18:22   ` Ingo Molnar
2008-04-28 18:36   ` Andrew Morton
2008-04-28 18:40     ` Christoph Hellwig
2008-04-28 18:47       ` Andrew Morton
2008-04-28 18:49         ` Christoph Hellwig
2008-04-28 19:01         ` KOSAKI Motohiro
2008-04-28 19:52     ` Peter Zijlstra
2008-04-28 22:25       ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080424151352.660545603@polymtl.ca \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=akpm@osdl.org \
    --cc=fche@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox