public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix x86_64 page fault scheduler race
@ 2008-04-22 13:21 Mathieu Desnoyers
  2008-04-22 13:26 ` Ingo Molnar
  0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2008-04-22 13:21 UTC (permalink / raw)
  To: mingo
  Cc: akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

> I think you're vastly overestimating what is sane to do from an NMI
> context.  It is utterly and totally insane to assume vmalloc is available
> in NMI.
>
>       -hpa
>

Ok, please tell me where I am wrong then.. by looking into
arch/x86/mm/fault.c, I see that vmalloc_sync_all() touches pgd_list
entries while the pgd_lock spinlock is taken, with interrupts disabled.
So it's protected against concurrent pgd_list modification from

a - vmalloc_sync_all() on other CPUs
b - local interrupts

However, a completely normal interrupt can come on a remote CPU, run
vmalloc_fault() and issue a set_pgd concurrently. Therefore I conclude
this interrupt disable is not there to insure any kind of protection
against concurrent updates.

Also, we see that vmalloc_fault has comments such as :

(for x86_32)
         * Do _not_ use "current" here. We might be inside
         * an interrupt in the middle of a task switch..

So it takes the pgd_addr from cr3, not from current. Using only the
stack/registers makes this NMI-safe even if "current" is invalid when
the NMI comes. This is caused by the fact that __switch_to will update
the registers before updating current_task without disabling interrupts.

You are right in that x86_64 does not seems to play as safely as x86_32
on this matter; it uses current->mm. Probably it shouldn't assume
"current" is valid. Actually, I don't see where x86_64 disables
interrupts around __switch_to, so this would seem to be a race
condition. Or have I missed something ?

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: akpm@osdl.org
CC: mingo@elte.hu
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
---
 arch/x86/mm/fault.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/arch/x86/mm/fault.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/mm/fault.c	2008-04-21 13:54:54.000000000 -0400
+++ linux-2.6-lttng/arch/x86/mm/fault.c	2008-04-21 14:26:12.000000000 -0400
@@ -513,6 +513,7 @@ static int vmalloc_fault(unsigned long a
 		return -1;
 	return 0;
 #else
+	unsigned long pgd_paddr;
 	pgd_t *pgd, *pgd_ref;
 	pud_t *pud, *pud_ref;
 	pmd_t *pmd, *pmd_ref;
@@ -526,7 +527,8 @@ static int vmalloc_fault(unsigned long a
 	   happen within a race in page table update. In the later
 	   case just flush. */
 
-	pgd = pgd_offset(current->mm ?: &init_mm, address);
+	pgd_paddr = read_cr3();
+	pgd = __va(pgd_paddr) + pgd_index(address);
 	pgd_ref = pgd_offset_k(address);
 	if (pgd_none(*pgd_ref))
 		return -1;

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix x86_64 page fault scheduler race
  2008-04-22 13:21 [PATCH] Fix x86_64 page fault scheduler race Mathieu Desnoyers
@ 2008-04-22 13:26 ` Ingo Molnar
  2008-04-22 14:06   ` Mathieu Desnoyers
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2008-04-22 13:26 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> You are right in that x86_64 does not seems to play as safely as 
> x86_32 on this matter; it uses current->mm. Probably it shouldn't 
> assume "current" is valid. Actually, I don't see where x86_64 disables 
> interrupts around __switch_to, so this would seem to be a race 
> condition. Or have I missed something ?

the scheduler disables interrupts around __switch_to(). (x86 does not 
set __ARCH_WANT_INTERRUPTS_ON_CTXSW)

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix x86_64 page fault scheduler race
  2008-04-22 13:26 ` Ingo Molnar
@ 2008-04-22 14:06   ` Mathieu Desnoyers
  2008-04-22 14:19     ` Ingo Molnar
  0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2008-04-22 14:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > You are right in that x86_64 does not seems to play as safely as 
> > x86_32 on this matter; it uses current->mm. Probably it shouldn't 
> > assume "current" is valid. Actually, I don't see where x86_64 disables 
> > interrupts around __switch_to, so this would seem to be a race 
> > condition. Or have I missed something ?
> 
> the scheduler disables interrupts around __switch_to(). (x86 does not 
> set __ARCH_WANT_INTERRUPTS_ON_CTXSW)
> 
> 	Ingo


Ok, so I guess it's only useful to NMIs then. However, it makes me
wonder why this comment was there in the first place on x86_32
vmalloc_fault() and why it uses read_cr3() :

        * Do _not_ use "current" here. We might be inside
        * an interrupt in the middle of a task switch..

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix x86_64 page fault scheduler race
  2008-04-22 14:06   ` Mathieu Desnoyers
@ 2008-04-22 14:19     ` Ingo Molnar
  0 siblings, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2008-04-22 14:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> > the scheduler disables interrupts around __switch_to(). (x86 does 
> > not set __ARCH_WANT_INTERRUPTS_ON_CTXSW)
> 
> Ok, so I guess it's only useful to NMIs then. However, it makes me 
> wonder why this comment was there in the first place on x86_32 
> vmalloc_fault() and why it uses read_cr3() :
> 
>         * Do _not_ use "current" here. We might be inside
>         * an interrupt in the middle of a task switch..

hm, i guess it's still useful to keep the 
__ARCH_WANT_INTERRUPTS_ON_CTXSW case working too. On -rt we used to 
enable it to squeeze a tiny bit more latency out of the system.

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-04-22 14:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-22 13:21 [PATCH] Fix x86_64 page fault scheduler race Mathieu Desnoyers
2008-04-22 13:26 ` Ingo Molnar
2008-04-22 14:06   ` Mathieu Desnoyers
2008-04-22 14:19     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox