From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
akpm@osdl.org, "H. Peter Anvin" <hpa@zytor.com>,
Jeremy Fitzhardinge <jeremy@goop.org>,
Steven Rostedt <rostedt@goodmis.org>,
"Frank Ch. Eigler" <fche@redhat.com>
Subject: [patch 02/37] x86_64 page fault NMI-safe
Date: Thu, 24 Apr 2008 11:03:26 -0400 [thread overview]
Message-ID: <20080424151352.660545603@polymtl.ca> (raw)
In-Reply-To: 20080424150324.802695381@polymtl.ca
[-- Attachment #1: x86_64-page-fault-nmi-safe.patch --]
[-- Type: text/plain, Size: 3462 bytes --]
> I think you're vastly overestimating what is sane to do from an NMI
> context. It is utterly and totally insane to assume vmalloc is available
> in NMI.
>
> -hpa
>
Ok, please tell me where I am wrong then.. by looking into
arch/x86/mm/fault.c, I see that vmalloc_sync_all() touches pgd_list
entries while the pgd_lock spinlock is taken, with interrupts disabled.
So it's protected against concurrent pgd_list modification from
a - vmalloc_sync_all() on other CPUs
b - local interrupts
However, a completely normal interrupt can come on a remote CPU, run
vmalloc_fault() and issue a set_pgd concurrently. Therefore I conclude
this interrupt disable is not there to insure any kind of protection
against concurrent updates.
Also, we see that vmalloc_fault has comments such as :
(for x86_32)
* Do _not_ use "current" here. We might be inside
* an interrupt in the middle of a task switch..
So it takes the pgd_addr from cr3, not from current. Using only the
stack/registers makes this NMI-safe even if "current" is invalid when
the NMI comes. This is caused by the fact that __switch_to will update
the registers before updating current_task without disabling interrupts.
You are right in that x86_64 does not seems to play as safely as x86_32
on this matter; it uses current->mm. Probably it shouldn't assume
"current" is valid. Actually, I don't see where x86_64 disables
interrupts around __switch_to, so this would seem to be a race
condition. Or have I missed something ?
(Ingo)
> > the scheduler disables interrupts around __switch_to(). (x86 does
> > not set __ARCH_WANT_INTERRUPTS_ON_CTXSW)
>
(Mathieu)
> Ok, so I guess it's only useful to NMIs then. However, it makes me
> wonder why this comment was there in the first place on x86_32
> vmalloc_fault() and why it uses read_cr3() :
>
> * Do _not_ use "current" here. We might be inside
> * an interrupt in the middle of a task switch..
(Ingo)
hm, i guess it's still useful to keep the
__ARCH_WANT_INTERRUPTS_ON_CTXSW case working too. On -rt we used to
enable it to squeeze a tiny bit more latency out of the system.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: akpm@osdl.org
CC: mingo@elte.hu
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
---
arch/x86/mm/fault.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6-sched-devel/arch/x86/mm/fault.c
===================================================================
--- linux-2.6-sched-devel.orig/arch/x86/mm/fault.c 2008-04-22 20:04:02.000000000 -0400
+++ linux-2.6-sched-devel/arch/x86/mm/fault.c 2008-04-22 20:09:50.000000000 -0400
@@ -535,6 +535,7 @@ static int vmalloc_fault(struct pt_regs
}
return 0;
#else
+ unsigned long pgd_paddr;
pgd_t *pgd, *pgd_ref;
pud_t *pud, *pud_ref;
pmd_t *pmd, *pmd_ref;
@@ -548,7 +549,8 @@ static int vmalloc_fault(struct pt_regs
happen within a race in page table update. In the later
case just flush. */
- pgd = pgd_offset(current->mm ?: &init_mm, address);
+ pgd_paddr = read_cr3();
+ pgd = __va(pgd_paddr) + pgd_index(address);
pgd_ref = pgd_offset_k(address);
if (pgd_none(*pgd_ref))
return -1;
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-04-24 15:20 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-24 15:03 [patch 00/37] Linux Kernel Markers instrumentation for sched-devel.git Mathieu Desnoyers
2008-04-24 15:03 ` [patch 01/37] Stringify support commas Mathieu Desnoyers
2008-04-24 15:03 ` Mathieu Desnoyers [this message]
2008-04-24 15:03 ` [patch 03/37] Change Alpha active count bit Mathieu Desnoyers
2008-04-24 15:03 ` [patch 04/37] Change avr32 " Mathieu Desnoyers
2008-04-24 15:03 ` [patch 05/37] x86 NMI-safe INT3 and Page Fault Mathieu Desnoyers
2008-04-24 15:03 ` [patch 06/37] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
2008-04-24 15:03 ` [patch 07/37] Kprobes - do not use kprobes mutex in arch code Mathieu Desnoyers
2008-04-24 15:03 ` [patch 08/37] Kprobes - declare kprobe_mutex static Mathieu Desnoyers
2008-04-24 15:03 ` [patch 09/37] Fix sched-devel text_poke Mathieu Desnoyers
2008-04-24 15:03 ` [patch 10/37] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2008-04-24 15:03 ` [patch 11/37] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 12/37] Add all cpus option to stop machine run Mathieu Desnoyers
2008-04-24 15:03 ` [patch 13/37] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2008-04-25 14:55 ` Ingo Molnar
2008-04-26 9:36 ` Ingo Molnar
2008-04-26 11:09 ` Ingo Molnar
2008-04-26 14:17 ` Mathieu Desnoyers
2008-04-24 15:03 ` [patch 14/37] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2008-04-24 15:03 ` [patch 15/37] Immediate Values - x86 Optimization Mathieu Desnoyers
2008-04-24 15:03 ` [patch 16/37] Add text_poke and sync_core to powerpc Mathieu Desnoyers
2008-04-24 15:03 ` [patch 17/37] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2008-04-24 15:03 ` [patch 18/37] Immediate Values - Documentation Mathieu Desnoyers
2008-04-24 15:03 ` [patch 19/37] Immediate Values Support init Mathieu Desnoyers
2008-04-24 15:03 ` [patch 20/37] Immediate Values - Move Kprobes x86 restore_interrupt to kdebug.h Mathieu Desnoyers
2008-04-24 15:03 ` [patch 21/37] Add __discard section to x86 Mathieu Desnoyers
2008-04-24 15:03 ` [patch 22/37] Immediate Values - x86 Optimization NMI and MCE support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 23/37] Immediate Values - Powerpc Optimization NMI " Mathieu Desnoyers
2008-04-24 15:03 ` [patch 24/37] Immediate Values Use Arch NMI and MCE Support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 25/37] Immediate Values - Jump Mathieu Desnoyers
2008-04-24 15:03 ` [patch 26/37] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
2008-04-24 15:03 ` [patch 27/37] From: Adrian Bunk <bunk@kernel.org> Mathieu Desnoyers
2008-04-28 9:54 ` Adrian Bunk
2008-04-28 12:37 ` Mathieu Desnoyers
2008-04-24 15:03 ` [patch 28/37] Markers - remove extra format argument Mathieu Desnoyers
2008-04-24 15:03 ` [patch 29/37] Markers - define non optimized marker Mathieu Desnoyers
2008-04-24 15:03 ` [patch 30/37] Linux Kernel Markers - Use Immediate Values Mathieu Desnoyers
2008-04-24 15:03 ` [patch 31/37] Markers use imv jump Mathieu Desnoyers
2008-04-24 15:03 ` [patch 32/37] Port ftrace to markers Mathieu Desnoyers
2008-04-24 15:03 ` [patch 33/37] LTTng instrumentation fs Mathieu Desnoyers
2008-04-24 15:03 ` [patch 34/37] LTTng instrumentation ipc Mathieu Desnoyers
2008-04-24 23:02 ` Alexey Dobriyan
2008-04-25 2:15 ` Frank Ch. Eigler
2008-04-25 12:56 ` Mathieu Desnoyers
2008-04-25 13:17 ` [RFC] system-wide in-kernel syscall tracing Mathieu Desnoyers
2008-05-04 21:04 ` [patch 34/37] LTTng instrumentation ipc Alexey Dobriyan
2008-05-04 20:39 ` Frank Ch. Eigler
2008-04-24 15:03 ` [patch 35/37] LTTng instrumentation kernel Mathieu Desnoyers
2008-04-24 15:04 ` [patch 36/37] LTTng instrumentation mm Mathieu Desnoyers
2008-04-24 15:04 ` Mathieu Desnoyers
2008-04-28 2:12 ` Masami Hiramatsu
2008-04-28 2:12 ` Masami Hiramatsu
2008-04-24 15:04 ` [patch 37/37] LTTng instrumentation net Mathieu Desnoyers
2008-04-24 15:52 ` Pavel Emelyanov
2008-04-24 16:13 ` Mathieu Desnoyers
2008-04-24 16:30 ` Pavel Emelyanov
2008-04-26 19:38 ` [patch 00/37] Linux Kernel Markers instrumentation for sched-devel.git Peter Zijlstra
2008-04-26 20:11 ` Mathieu Desnoyers
2008-04-27 10:00 ` Peter Zijlstra
2008-05-04 15:08 ` Mathieu Desnoyers
2008-04-28 18:22 ` Ingo Molnar
2008-04-28 18:36 ` Andrew Morton
2008-04-28 18:40 ` Christoph Hellwig
2008-04-28 18:47 ` Andrew Morton
2008-04-28 18:49 ` Christoph Hellwig
2008-04-28 19:01 ` KOSAKI Motohiro
2008-04-28 19:52 ` Peter Zijlstra
2008-04-28 22:25 ` Masami Hiramatsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080424151352.660545603@polymtl.ca \
--to=mathieu.desnoyers@polymtl.ca \
--cc=akpm@linux-foundation.org \
--cc=akpm@osdl.org \
--cc=fche@redhat.com \
--cc=hpa@zytor.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.