2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu

All of lore.kernel.org
 help / color / mirror / Atom feed

* 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode()
@ 2010-12-08  0:54 Chuck Anderson
  2010-12-08  8:48 ` Jan Beulich
  2010-12-08 22:28 ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 7+ messages in thread
From: Chuck Anderson @ 2010-12-08  0:54 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

I'm posting this because I am writing a patch to fix a 2.6.32 based PV 
Xen domU panic due to a nested call to arch/x86/include/asm/paravirt.h 
arch_enter_lazy_mmu_mode() (see details below).  The following BUG_ON() 
was triggered:

    arch/x86/kernel/paravirt.c

    static inline void enter_lazy(enum paravirt_lazy_mode mode)
    {
            BUG_ON(percpu_read(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);

            percpu_write(paravirt_lazy_mode, mode);
    }

because enter_lazy() was called twice, once through mm/memory.c 
copy_pte_range() and a second time through an interrupt path.

The easy fix is to disable interrupts in copy_pte_range() before calling 
arch_enter_lazy_mmu_mode() and re-enable them after the call to 
arch_leave_lazy_mmu_mode() but I'm asking if there is a better way to 
handle this.  If disabling interrupts is best, there are other calls to 
arch_enter_lazy_mmu_mode() that appear to have the same interruption 
issue.  It may be best then to disable interrupts in 
arch_enter_lazy_mmu_mode() or paravirt_enter_lazy_mmu().

Here is how the nested call to arch_enter_lazy_mmu_mode() was made.  The 
first call path is:

    do_fork()
      copy_process()
        dup_mm()
          dup_mmap()
            copy_page_range()
              copy_pud_range()
                copy_pmd_range()
                  copy_pte_range()
                    arch_enter_lazy_mmu_mode()
                      paravirt_enter_lazy_mmu()
                        enter_lazy()

We bubble back up to mm/memory.c copy_pte_range().  The guest is 
interrupted in that function.  Here is the edited interrupt call stack 
that gets us to arch_enter_lazy_mmu_mode() for the second time without 
an intervening arch_leave_lazy_mmu_mode(), triggering the BUG_ON() in 
enter_lazy():

    xen_evtchn_do_upcall()
     handle_irq()
       blkif_interrupt()
         do_blkif_request()
           blkif_queue_request()
             gnttab_alloc_grant_references()
               get_free_entries()
                 gnttab_expand()
                   gnttab_map()
                     arch_gnttab_map_shared()
                       apply_to_page_range(... map_pte_fn ...)

We get to enter_lazy() downstream from apply_to_page_range():

    apply_to_page_range(... map_pte_fn ...)
      apply_to_pud_range(... map_pte_fn ...)
        apply_to_pmd_range(... map_pte_fn ...)
           apply_to_pte_range(... map_pte_fn ...)
             arch_enter_lazy_mmu_mode()
               paravirt_enter_lazy_mmu()
                 enter_lazy()

The spin locks acquired indirectly through mm/memory.c copy_pte_range() 
are obtained with spin_lock() and spin_acquire() which I believe do not 
disable interrupts.

Thanks,
Chuck

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode()
  2010-12-08  0:54 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode() Chuck Anderson
@ 2010-12-08  8:48 ` Jan Beulich
  2010-12-08 21:21   ` Jeremy Fitzhardinge
  2010-12-08 22:28 ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2010-12-08  8:48 UTC (permalink / raw)
  To: Chuck Anderson; +Cc: xen-devel@lists.xensource.com

>>> On 08.12.10 at 01:54, Chuck Anderson <chuck.anderson@oracle.com> wrote:
> I'm posting this because I am writing a patch to fix a 2.6.32 based PV 
> Xen domU panic due to a nested call to arch/x86/include/asm/paravirt.h 
> arch_enter_lazy_mmu_mode() (see details below).  The following BUG_ON() 
> was triggered:
> 
>     arch/x86/kernel/paravirt.c
> 
>     static inline void enter_lazy(enum paravirt_lazy_mode mode)
>     {
>             BUG_ON(percpu_read(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
> 
>             percpu_write(paravirt_lazy_mode, mode);
>     }
> 
> because enter_lazy() was called twice, once through mm/memory.c 
> copy_pte_range() and a second time through an interrupt path.
> 
> The easy fix is to disable interrupts in copy_pte_range() before calling 
> arch_enter_lazy_mmu_mode() and re-enable them after the call to 
> arch_leave_lazy_mmu_mode() but I'm asking if there is a better way to 
> handle this.  If disabling interrupts is best, there are other calls to 
> arch_enter_lazy_mmu_mode() that appear to have the same interruption 
> issue.  It may be best then to disable interrupts in 
> arch_enter_lazy_mmu_mode() or paravirt_enter_lazy_mmu().

I don't think this is an option, as the period of time for which you
would disable interrupts could be pretty much unbounded.

Instead (being a performance optimization only anyway)
the BUG_ON() could be removed (accepting that the
interrupted sequence would not batch any further
hypercalls, and provided all of this stuff can actually be
used in a nested way), the flag could be converted to a
counter (again provided nesting is okay here in the first
place), or a filter could be applied when actually checking
whether to batch (which is what we do in our non-pvops
kernels: in IRQ context, no batching happens).

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode()
  2010-12-08  8:48 ` Jan Beulich
@ 2010-12-08 21:21   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-08 21:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xensource.com, Chuck Anderson

On 12/08/2010 12:48 AM, Jan Beulich wrote:
>>>> On 08.12.10 at 01:54, Chuck Anderson <chuck.anderson@oracle.com> wrote:
>> I'm posting this because I am writing a patch to fix a 2.6.32 based PV 
>> Xen domU panic due to a nested call to arch/x86/include/asm/paravirt.h 
>> arch_enter_lazy_mmu_mode() (see details below).  The following BUG_ON() 
>> was triggered:
>>
>>     arch/x86/kernel/paravirt.c
>>
>>     static inline void enter_lazy(enum paravirt_lazy_mode mode)
>>     {
>>             BUG_ON(percpu_read(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
>>
>>             percpu_write(paravirt_lazy_mode, mode);
>>     }
>>
>> because enter_lazy() was called twice, once through mm/memory.c 
>> copy_pte_range() and a second time through an interrupt path.
>>
>> The easy fix is to disable interrupts in copy_pte_range() before calling 
>> arch_enter_lazy_mmu_mode() and re-enable them after the call to 
>> arch_leave_lazy_mmu_mode() but I'm asking if there is a better way to 
>> handle this.  If disabling interrupts is best, there are other calls to 
>> arch_enter_lazy_mmu_mode() that appear to have the same interruption 
>> issue.  It may be best then to disable interrupts in 
>> arch_enter_lazy_mmu_mode() or paravirt_enter_lazy_mmu().
> I don't think this is an option, as the period of time for which you
> would disable interrupts could be pretty much unbounded.
>
> Instead (being a performance optimization only anyway)
> the BUG_ON() could be removed (accepting that the
> interrupted sequence would not batch any further
> hypercalls, and provided all of this stuff can actually be
> used in a nested way), the flag could be converted to a
> counter (again provided nesting is okay here in the first
> place), or a filter could be applied when actually checking
> whether to batch (which is what we do in our non-pvops
> kernels: in IRQ context, no batching happens).

That's what happens in pvops kernels too - batching is disabled in
interrupt context so that (for example) vmalloc pagefault pte updates
aren't deferred.

Looks like enter/leave lazy should just be no-op in interrupt context too.

Though I'm surprised it has taken so long for this to appear.

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode()
  2010-12-08  0:54 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode() Chuck Anderson
  2010-12-08  8:48 ` Jan Beulich
@ 2010-12-08 22:28 ` Jeremy Fitzhardinge
  2010-12-09  1:21   ` Chuck Anderson
  1 sibling, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-08 22:28 UTC (permalink / raw)
  To: Chuck Anderson; +Cc: xen-devel@lists.xensource.com, Jan Beulich

On 12/07/2010 04:54 PM, Chuck Anderson wrote:
> The easy fix is to disable interrupts in copy_pte_range() before
> calling arch_enter_lazy_mmu_mode() and re-enable them after the call
> to arch_leave_lazy_mmu_mode() but I'm asking if there is a better way
> to handle this.  If disabling interrupts is best, there are other
> calls to arch_enter_lazy_mmu_mode() that appear to have the same
> interruption issue.  It may be best then to disable interrupts in
> arch_enter_lazy_mmu_mode() or paravirt_enter_lazy_mmu().

Disabling interrupts would cause too much latency.  I think we may have
done this at one point, but it is very antisocial.

Since lazy mode is effectively disabled in interrupt handlers anyway, it
should just be enough to ignore enter/leave requests.  Does this work
for you?

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Wed, 8 Dec 2010 14:21:16 -0800
Subject: [PATCH] x86/paravirt: don't enter/leave lazy mode in interrupts.

We already ignore the current state of lazy mode in interrupts, but we
should also ignore any attempt to enter/leave lazy mode within
an interrupt context.

enter_lazy() will BUG if it sees an attempt at a nested entry to lazy
mode, which is generally an error.  However, it's possible that an
interrupt handler may do something that would trigger a batched MMU
update, for example, and that could interrupt an existing batched update.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reported-by: Chuck Anderson <chuck.anderson@oracle.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Stable Kernel <stable@kernel.org>

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c5b2500..a2ad10d 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -231,6 +231,9 @@ static DEFINE_PER_CPU(enum paravirt_lazy_mode, paravirt_lazy_mode) = PARAVIRT_LA
 
 static inline void enter_lazy(enum paravirt_lazy_mode mode)
 {
+	if (in_interrupt())
+		return;
+
 	BUG_ON(percpu_read(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
 
 	percpu_write(paravirt_lazy_mode, mode);
@@ -238,6 +241,9 @@ static inline void enter_lazy(enum paravirt_lazy_mode mode)
 
 static void leave_lazy(enum paravirt_lazy_mode mode)
 {
+	if (in_interrupt())
+		return;
+
 	BUG_ON(percpu_read(paravirt_lazy_mode) != mode);
 
 	percpu_write(paravirt_lazy_mode, PARAVIRT_LAZY_NONE);


Thanks,
	J

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode()
  2010-12-08 22:28 ` Jeremy Fitzhardinge
@ 2010-12-09  1:21   ` Chuck Anderson
  2010-12-09  6:50     ` Chuck Anderson
  2010-12-09 17:43     ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 7+ messages in thread
From: Chuck Anderson @ 2010-12-09  1:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com, Jan Beulich

Jeremy,
  Is it possible for an ongoing lazy mode update to have batched some 
MMU updates; an interrupt occurs; an interrupt routine does a non-lazy 
MMU update for a PTE that is also in the lazy update queue; that update 
is overwritten on return from the interrupt when the update queue is 
flushed?  Or are the PTE updates protected by a lock?  If they are, 
wouldn't we deadlock in the interrupt routine when it tries to obtain 
that (I assume) spinlock?
Chuck

Jeremy Fitzhardinge wrote:
> Disabling interrupts would cause too much latency.  I think we may have
> done this at one point, but it is very antisocial.
>
> Since lazy mode is effectively disabled in interrupt handlers anyway, it
> should just be enough to ignore enter/leave requests.  Does this work
> for you?
>
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date: Wed, 8 Dec 2010 14:21:16 -0800
> Subject: [PATCH] x86/paravirt: don't enter/leave lazy mode in interrupts.
>
> We already ignore the current state of lazy mode in interrupts, but we
> should also ignore any attempt to enter/leave lazy mode within
> an interrupt context.
>
> enter_lazy() will BUG if it sees an attempt at a nested entry to lazy
> mode, which is generally an error.  However, it's possible that an
> interrupt handler may do something that would trigger a batched MMU
> update, for example, and that could interrupt an existing batched update.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode()
  2010-12-09  1:21   ` Chuck Anderson
@ 2010-12-09  6:50     ` Chuck Anderson
  2010-12-09 17:43     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 7+ messages in thread
From: Chuck Anderson @ 2010-12-09  6:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com, Jan Beulich

Jeremy,
  Looking at copy_pte_range(), the stale update scenario I described 
below can't happen.  I believe the deadlock could happen but that is not 
a lazy/not lazy MMU update issue.

Here is an extract from your proposed patch:

 static inline void enter_lazy(enum paravirt_lazy_mode mode)
 {
+    if (in_interrupt())
+        return;
+
     BUG_ON(percpu_read(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);

My vote is for something like:

 static inline void enter_lazy(enum paravirt_lazy_mode mode)
 {
-       BUG_ON(percpu_read(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
+       /*
+        * Switch modes only if we are not in an interrupt context.
+        * The mode is ignored while handling an interrupt.
+        */
+       if (!in_interrupt()) {
+               BUG_ON(percpu_read(paravirt_lazy_mode) != 
PARAVIRT_LAZY_NONE);

-       percpu_write(paravirt_lazy_mode, mode);
+               percpu_write(paravirt_lazy_mode, mode);
+       }
 }

 static void leave_lazy(enum paravirt_lazy_mode mode)
 {
-        BUG_ON(percpu_read(paravirt_lazy_mode) != mode);
+      /*
+       * Switch modes only if we are not in an interrupt context.
+       * The mode is ignored while handling an interrupt.
+       */
+      if (!in_interrupt()) {
+              BUG_ON(percpu_read(paravirt_lazy_mode) != mode);
 
-        percpu_write(paravirt_lazy_mode, PARAVIRT_LAZY_NONE);
+              percpu_write(paravirt_lazy_mode, PARAVIRT_LAZY_NONE);
+       }
 }

Thanks,
Chuck

Chuck Anderson wrote:
> Jeremy,
>  Is it possible for an ongoing lazy mode update to have batched some 
> MMU updates; an interrupt occurs; an interrupt routine does a non-lazy 
> MMU update for a PTE that is also in the lazy update queue; that 
> update is overwritten on return from the interrupt when the update 
> queue is flushed?  Or are the PTE updates protected by a lock?  If 
> they are, wouldn't we deadlock in the interrupt routine when it tries 
> to obtain that (I assume) spinlock?
> Chuck

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode()
  2010-12-09  1:21   ` Chuck Anderson
  2010-12-09  6:50     ` Chuck Anderson
@ 2010-12-09 17:43     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-09 17:43 UTC (permalink / raw)
  To: Chuck Anderson; +Cc: xen-devel@lists.xensource.com, Jan Beulich

On 12/08/2010 05:21 PM, Chuck Anderson wrote:
> Jeremy,
>  Is it possible for an ongoing lazy mode update to have batched some
> MMU updates; an interrupt occurs; an interrupt routine does a non-lazy
> MMU update for a PTE that is also in the lazy update queue; that
> update is overwritten on return from the interrupt when the update
> queue is flushed?  Or are the PTE updates protected by a lock?  If
> they are, wouldn't we deadlock in the interrupt routine when it tries
> to obtain that (I assume) spinlock?

The kernel-wide rule is that to update a usermode pte, you must be
holding the appropriate pte lock.  The pte lock is not interrupt safe,
so it is never correct to do a usermode pte update from interrupt context.

Kernel pte updates don't have any particular lock associated with them;
each subsystem generally has its own locking scheme to serialize the
updates if necessary.  Overall the kernel's mappings aren't changed very
often, except for specific things like kmap, vmalloc, page attributes, etc.

So the circumstances you point out would be bugs regardless of whether
Xen or lazy mmu updates are in effect.  Lazy updates rely on those rules
being correctly enforced (in particular, it is never correct to be in
lazy mmu update mode for usermode ptes without holding the pte lock).

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-09 17:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-08  0:54 2.6.32 PV Xen donU guest panic on nested call to arch_enter_lazy_mmu_mode() Chuck Anderson
2010-12-08  8:48 ` Jan Beulich
2010-12-08 21:21   ` Jeremy Fitzhardinge
2010-12-08 22:28 ` Jeremy Fitzhardinge
2010-12-09  1:21   ` Chuck Anderson
2010-12-09  6:50     ` Chuck Anderson
2010-12-09 17:43     ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.