[PATCH v1 0/2] xen/arm: maintenance

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
@ 2014-01-27 17:33 Oleksandr Tyshchenko
  2014-01-27 17:33 ` [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function Oleksandr Tyshchenko
                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-27 17:33 UTC (permalink / raw)
  To: xen-devel

Hi, all.

We are trying to bringing up XEN on DRA7XX (OMAP5) platform.

We sometimes see some hangs in Hypervisor and these hangs are related to SMP.
We found out that deadlock took place in on_selected_cpus function
in case of simultaneous occurrence cross-interrupts.

The issue:

1. We receive irqs from first CPU (for example CPU0) and second CPU (for example CPU1) in parallel.
2. In our case the maintenance_interrupt function for maintenance irq from CPU0 is executed on CPU1 and
maintenance_interrupt for irq from CPU1 is executed on CPU0.
3. According to existing logic we have run gic_irq_eoi function on CPU which it was scheduled.
4. Due to this in both cases we need to call on_selected_cpus function to EOI irqs.
5. For the CPU0 on_selected_cpus function is called where we take a lock in the beginning of the function
and continue to execute it.
6. Parallel to this the same function is called for the CPU1 where we stop after attempting to take a lock because it is already holding.  
7. For the CPU0 we send IPI and going to wait until CPU1 execute function and cleared cpumask.
8. But the mask will never be cleaned, because CPU1 is waiting too. 

Now, we have next situation. The CPU0 can not exit from busy loop, it is waiting CPU1 to execute function and clear mask, but CPU0 is waiting to release lock. 
This causes to deadlock.  

Since as we needed solution to avoid hangs the attached patch was created. The solution is just to
call the smp_call_function_interrupt function if lock is holding. This causes the waiting CPU to exit from busy loop and release lock.
But I am afraid this solution not completed and maybe not enough for stable work. I would appreciate if you could explain me how to solve the issue in a right way or give some advices.

P.S. We use next SW:
1. Hypervisor - XEN 4.4 unstable
2. Dom0 - Kernel 3.8
3. DomU - Kernel 3.8 

Oleksandr Tyshchenko (2):
  xen/arm: Add return value to smp_call_function_interrupt function
  xen/arm: Fix deadlock in on_selected_cpus function

 xen/common/smp.c      |   13 ++++++++++---
 xen/include/xen/smp.h |    2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function
  2014-01-27 17:33 [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Oleksandr Tyshchenko
@ 2014-01-27 17:33 ` Oleksandr Tyshchenko
  2014-01-27 18:28   ` Stefano Stabellini
  2014-01-27 17:33 ` [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function Oleksandr Tyshchenko
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-27 17:33 UTC (permalink / raw)
  To: xen-devel

Let the function returns error if the action can not be executed.

Change-Id: Iace691850024656239326bf0e3c87b57cb1b8ab3
Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
---
 xen/common/smp.c      |    7 +++++--
 xen/include/xen/smp.h |    2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/common/smp.c b/xen/common/smp.c
index 482a203..2700bd7 100644
--- a/xen/common/smp.c
+++ b/xen/common/smp.c
@@ -20,6 +20,7 @@
 #include <asm/processor.h>
 #include <xen/spinlock.h>
 #include <xen/smp.h>
+#include <xen/errno.h>
 
 /*
  * Structure and data for smp_call_function()/on_selected_cpus().
@@ -75,14 +76,14 @@ out:
     spin_unlock(&call_lock);
 }
 
-void smp_call_function_interrupt(void)
+int smp_call_function_interrupt(void)
 {
     void (*func)(void *info) = call_data.func;
     void *info = call_data.info;
     unsigned int cpu = smp_processor_id();
 
     if ( !cpumask_test_cpu(cpu, &call_data.selected) )
-        return;
+        return -EPERM;
 
     irq_enter();
 
@@ -100,6 +101,8 @@ void smp_call_function_interrupt(void)
     }
 
     irq_exit();
+
+    return 0;
 }
 
 /*
diff --git a/xen/include/xen/smp.h b/xen/include/xen/smp.h
index 6febb56..6d05910 100644
--- a/xen/include/xen/smp.h
+++ b/xen/include/xen/smp.h
@@ -61,7 +61,7 @@ static inline void on_each_cpu(
 /*
  * Call a function on the current CPU
  */
-void smp_call_function_interrupt(void);
+int smp_call_function_interrupt(void);
 
 void smp_send_call_function_mask(const cpumask_t *mask);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function
  2014-01-27 17:33 ` [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function Oleksandr Tyshchenko
@ 2014-01-27 18:28   ` Stefano Stabellini
  0 siblings, 0 replies; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-27 18:28 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel

On Mon, 27 Jan 2014, Oleksandr Tyshchenko wrote:
> Let the function returns error if the action can not be executed.

Unless you make the calling function (do_sgi) check for the return
value, this patch won't change anything.


> Change-Id: Iace691850024656239326bf0e3c87b57cb1b8ab3
> Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> ---
>  xen/common/smp.c      |    7 +++++--
>  xen/include/xen/smp.h |    2 +-
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/smp.c b/xen/common/smp.c
> index 482a203..2700bd7 100644
> --- a/xen/common/smp.c
> +++ b/xen/common/smp.c
> @@ -20,6 +20,7 @@
>  #include <asm/processor.h>
>  #include <xen/spinlock.h>
>  #include <xen/smp.h>
> +#include <xen/errno.h>
>  
>  /*
>   * Structure and data for smp_call_function()/on_selected_cpus().
> @@ -75,14 +76,14 @@ out:
>      spin_unlock(&call_lock);
>  }
>  
> -void smp_call_function_interrupt(void)
> +int smp_call_function_interrupt(void)
>  {
>      void (*func)(void *info) = call_data.func;
>      void *info = call_data.info;
>      unsigned int cpu = smp_processor_id();
>  
>      if ( !cpumask_test_cpu(cpu, &call_data.selected) )
> -        return;
> +        return -EPERM;
>  
>      irq_enter();
>  
> @@ -100,6 +101,8 @@ void smp_call_function_interrupt(void)
>      }
>  
>      irq_exit();
> +
> +    return 0;
>  }
>  
>  /*
> diff --git a/xen/include/xen/smp.h b/xen/include/xen/smp.h
> index 6febb56..6d05910 100644
> --- a/xen/include/xen/smp.h
> +++ b/xen/include/xen/smp.h
> @@ -61,7 +61,7 @@ static inline void on_each_cpu(
>  /*
>   * Call a function on the current CPU
>   */
> -void smp_call_function_interrupt(void);
> +int smp_call_function_interrupt(void);
>  
>  void smp_send_call_function_mask(const cpumask_t *mask);
>  
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-27 17:33 [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Oleksandr Tyshchenko
  2014-01-27 17:33 ` [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function Oleksandr Tyshchenko
@ 2014-01-27 17:33 ` Oleksandr Tyshchenko
  2014-01-27 19:00   ` Stefano Stabellini
  2014-01-28 13:58   ` Stefano Stabellini
  2014-01-27 17:40 ` [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Ian Campbell
  2014-01-27 17:51 ` Julien Grall
  3 siblings, 2 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-27 17:33 UTC (permalink / raw)
  To: xen-devel

This patch is needed to avoid possible deadlocks in case of simultaneous
occurrence cross-interrupts.

Change-Id: I574b496442253a7b67a27e2edd793526c8131284
Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
---
 xen/common/smp.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/common/smp.c b/xen/common/smp.c
index 2700bd7..46d2fc6 100644
--- a/xen/common/smp.c
+++ b/xen/common/smp.c
@@ -55,7 +55,11 @@ void on_selected_cpus(
 
     ASSERT(local_irq_is_enabled());
 
-    spin_lock(&call_lock);
+    if (!spin_trylock(&call_lock)) {
+        if (smp_call_function_interrupt())
+            return;
+        spin_lock(&call_lock);
+    }
 
     cpumask_copy(&call_data.selected, selected);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-27 17:33 ` [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function Oleksandr Tyshchenko
@ 2014-01-27 19:00   ` Stefano Stabellini
  2014-01-28 10:03     ` Ian Campbell
  2014-01-28 13:58   ` Stefano Stabellini
  1 sibling, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-27 19:00 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel

On Mon, 27 Jan 2014, Oleksandr Tyshchenko wrote:
> This patch is needed to avoid possible deadlocks in case of simultaneous
> occurrence cross-interrupts.
> 
> Change-Id: I574b496442253a7b67a27e2edd793526c8131284
> Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> ---
>  xen/common/smp.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/smp.c b/xen/common/smp.c
> index 2700bd7..46d2fc6 100644
> --- a/xen/common/smp.c
> +++ b/xen/common/smp.c
> @@ -55,7 +55,11 @@ void on_selected_cpus(
>  
>      ASSERT(local_irq_is_enabled());
>  
> -    spin_lock(&call_lock);
> +    if (!spin_trylock(&call_lock)) {
> +        if (smp_call_function_interrupt())
> +            return;
> +        spin_lock(&call_lock);
> +    }
>  
>      cpumask_copy(&call_data.selected, selected);

So this is where you check for the return value of
smp_call_function_interrupt.

I think it would be better to move the on_selected_cpus call out of
maintenance_interrupt, after the write to EOIR (caused by
desc->handler->end(desc) at the end of xen/arch/arm/irq.c:do_IRQ).
Maybe to a tasklet.

Alternatively, as Ian suggested, we could increase the priotiry of SGIs
but I am a bit wary of making that change at RC2.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-27 19:00   ` Stefano Stabellini
@ 2014-01-28 10:03     ` Ian Campbell
  2014-01-28 14:00       ` Stefano Stabellini
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-01-28 10:03 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Oleksandr Tyshchenko, xen-devel

On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> but I am a bit wary of making that change at RC2.

I'm leaning the other way -- I'm wary of open coding magic locking
primitives to work around this issue on a case by case basis. It's just
too subtle IMHO.

The IPI and cross CPU calling primitives are basically predicated on
those IPIs interrupting normal interrupt handlers.

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-28 10:03     ` Ian Campbell
@ 2014-01-28 14:00       ` Stefano Stabellini
  2014-01-28 15:05         ` Ian Campbell
  0 siblings, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-28 14:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Oleksandr Tyshchenko, xen-devel, Stefano Stabellini

On Tue, 28 Jan 2014, Ian Campbell wrote:
> On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > but I am a bit wary of making that change at RC2.
> 
> I'm leaning the other way -- I'm wary of open coding magic locking
> primitives to work around this issue on a case by case basis. It's just
> too subtle IMHO.
> 
> The IPI and cross CPU calling primitives are basically predicated on
> those IPIs interrupting normal interrupt handlers.

The problem is that we don't know if we can context switch properly
nested interrupts. Also I would need to think harder whether everything
would work correctly without hitches with multiple SGIs happening
simultaneously (with more than 2 cpus involved).

On the other hand we know that both Oleksandr's and my solution should
work OK with no surprises if implemented correctly.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-28 14:00       ` Stefano Stabellini
@ 2014-01-28 15:05         ` Ian Campbell
  2014-01-28 16:02           ` Stefano Stabellini
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-01-28 15:05 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Oleksandr Tyshchenko, xen-devel

On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> On Tue, 28 Jan 2014, Ian Campbell wrote:
> > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > but I am a bit wary of making that change at RC2.
> > 
> > I'm leaning the other way -- I'm wary of open coding magic locking
> > primitives to work around this issue on a case by case basis. It's just
> > too subtle IMHO.
> > 
> > The IPI and cross CPU calling primitives are basically predicated on
> > those IPIs interrupting normal interrupt handlers.
> 
> The problem is that we don't know if we can context switch properly
> nested interrupts.

What do you mean? We don't have to context switch an IPI.

> Also I would need to think harder whether everything
> would work correctly without hitches with multiple SGIs happening
> simultaneously (with more than 2 cpus involved).

Since all IPIs would be at the same higher priority only one will be
active on each CPU at a time. If you are worried about multiple CPUs
then that is already an issue today, just at a lower priority.

I have hacked the IPI priority to be higher in the past and it worked
fine, I just never got round to cleaning it up for submission (I hadn't
thought of the locking thing and my use case was low priority).

The interrupt entry and exit paths were written with nested interrupt in
mind and they have to be so in order to handle interrupts which occur
from both guest and hypervisor context.

> On the other hand we know that both Oleksandr's and my solution should
> work OK with no surprises if implemented correctly.

That's a big if in my mind, any use of trylock is very subtle IMOH.

AIUI this issue only occurs with "proto device assignment" patches added
to 4.4, n which case I think the solution can wait until 4.5 and can be
done properly via the IPI priority fix.

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-28 15:05         ` Ian Campbell
@ 2014-01-28 16:02           ` Stefano Stabellini
  2014-01-28 16:12             ` Ian Campbell
  0 siblings, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-28 16:02 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Oleksandr Tyshchenko, xen-devel, Stefano Stabellini

On Tue, 28 Jan 2014, Ian Campbell wrote:
> On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > > but I am a bit wary of making that change at RC2.
> > > 
> > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > primitives to work around this issue on a case by case basis. It's just
> > > too subtle IMHO.
> > > 
> > > The IPI and cross CPU calling primitives are basically predicated on
> > > those IPIs interrupting normal interrupt handlers.
> > 
> > The problem is that we don't know if we can context switch properly
> > nested interrupts.
> 
> What do you mean? We don't have to context switch an IPI.

Sorry, I meant save/restore registers, stack pointer, processor mode,
etc, for nested interrupts.


> > Also I would need to think harder whether everything
> > would work correctly without hitches with multiple SGIs happening
> > simultaneously (with more than 2 cpus involved).
> 
> Since all IPIs would be at the same higher priority only one will be
> active on each CPU at a time. If you are worried about multiple CPUs
> then that is already an issue today, just at a lower priority.

That is correct, but if we moved the on_selected_cpus call out of the
interrupt handler I don't think the problem could occur.


> I have hacked the IPI priority to be higher in the past and it worked
> fine, I just never got round to cleaning it up for submission (I hadn't
> thought of the locking thing and my use case was low priority).
> 
> The interrupt entry and exit paths were written with nested interrupt in
> mind and they have to be so in order to handle interrupts which occur
> from both guest and hypervisor context.
>
> > On the other hand we know that both Oleksandr's and my solution should
> > work OK with no surprises if implemented correctly.
> 
> That's a big if in my mind, any use of trylock is very subtle IMOH.

I wouldn't accept that patch for 4.4 either (or for 4.5 given that we
can certainly come up with something better by that time).


> AIUI this issue only occurs with "proto device assignment" patches added
> to 4.4, n which case I think the solution can wait until 4.5 and can be
> done properly via the IPI priority fix.
 
I think this is a pretty significant problem, even if we don't commit a
fix, we should post a proper patch that we deem acceptable and link it to
the wiki so that anybody that needs it can find it and be sure that it
works correctly.
In my opinion if we go to this length then we might as well commit it
(if it doesn't touch common code of course), but I'll leave the decision
up to you and George.

Given the constraints, the solution I would feel more comfortable with at
this time is something like the following patch (lightly tested):

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e6257a7..b00ca73 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
 
 static unsigned nr_lrs;
 
+static void gic_irq_eoi(void *info)
+{
+    int virq = (uintptr_t) info;
+    GICC[GICC_DIR] = virq;
+}
+
+static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
+static void eoi_action(unsigned long unused)
+{
+    struct pending_irq *p = this_cpu(eoi_irq);
+    ASSERT(p != NULL);
+
+    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
+            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
+
+    this_cpu(eoi_irq) = NULL;
+}
+static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
+
 /* The GIC mapping of CPU interfaces does not necessarily match the
  * logical CPU numbering. Let's use mapping as returned by the GIC
  * itself
@@ -897,12 +916,6 @@ int gicv_setup(struct domain *d)
 
 }
 
-static void gic_irq_eoi(void *info)
-{
-    int virq = (uintptr_t) info;
-    GICC[GICC_DIR] = virq;
-}
-
 static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
 {
     int i = 0, virq, pirq = -1;
@@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
             if ( cpu == smp_processor_id() )
                 gic_irq_eoi((void*)(uintptr_t)pirq);
             else
-                on_selected_cpus(cpumask_of(cpu),
-                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
+            {
+                ASSERT(this_cpu(eoi_irq) == NULL);
+                this_cpu(eoi_irq) = p;
+                tasklet_schedule(&eoi_tasklet);
+            }
         }
 
         i++;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-28 16:02           ` Stefano Stabellini
@ 2014-01-28 16:12             ` Ian Campbell
  2014-01-28 16:23               ` Stefano Stabellini
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-01-28 16:12 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Oleksandr Tyshchenko, xen-devel

On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote:
> On Tue, 28 Jan 2014, Ian Campbell wrote:
> > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > > > but I am a bit wary of making that change at RC2.
> > > > 
> > > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > > primitives to work around this issue on a case by case basis. It's just
> > > > too subtle IMHO.
> > > > 
> > > > The IPI and cross CPU calling primitives are basically predicated on
> > > > those IPIs interrupting normal interrupt handlers.
> > > 
> > > The problem is that we don't know if we can context switch properly
> > > nested interrupts.
> > 
> > What do you mean? We don't have to context switch an IPI.
> 
> Sorry, I meant save/restore registers, stack pointer, processor mode,
> etc, for nested interrupts.

Right, we do handle that actually since it is the same code as handles
an IRQ during a hypercall (since a hypercall is just another type of
trap).

> > > Also I would need to think harder whether everything
> > > would work correctly without hitches with multiple SGIs happening
> > > simultaneously (with more than 2 cpus involved).
> > 
> > Since all IPIs would be at the same higher priority only one will be
> > active on each CPU at a time. If you are worried about multiple CPUs
> > then that is already an issue today, just at a lower priority.
> 
> That is correct, but if we moved the on_selected_cpus call out of the
> interrupt handler I don't think the problem could occur.

Sure, I'm not opposed to fixing this issue by making an architectural
improvement to the code which happens to not use on_selected_cpus.

> > AIUI this issue only occurs with "proto device assignment" patches added
> > to 4.4, n which case I think the solution can wait until 4.5 and can be
> > done properly via the IPI priority fix.
>  
> I think this is a pretty significant problem, even if we don't commit a
> fix, we should post a proper patch that we deem acceptable and link it to
> the wiki so that anybody that needs it can find it and be sure that it
> works correctly.

FWIW I have an almost fix to the IPI priority thing which I would intend
to fix in 4.5 regardless of whether this issue needs it or not.

> In my opinion if we go to this length then we might as well commit it
> (if it doesn't touch common code of course), but I'll leave the decision
> up to you and George.
> 
> Given the constraints, the solution I would feel more comfortable with at
> this time is something like the following patch (lightly tested):

I don't think this buys you anything over just fixing on_selected_cpus
to work as it should, unless there is some reason why deferring this
work to a tasklet is the logically correct thing to do and/or a better
designh?

(IOW if you are only doing this to avoid calling on_selected_cpus in
interrupt context then I think this is the wrong fix).

Ian.

> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e6257a7..b00ca73 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>  
>  static unsigned nr_lrs;
>  
> +static void gic_irq_eoi(void *info)
> +{
> +    int virq = (uintptr_t) info;
> +    GICC[GICC_DIR] = virq;
> +}
> +
> +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
> +static void eoi_action(unsigned long unused)
> +{
> +    struct pending_irq *p = this_cpu(eoi_irq);
> +    ASSERT(p != NULL);
> +
> +    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
> +            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
> +
> +    this_cpu(eoi_irq) = NULL;
> +}
> +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
> +
>  /* The GIC mapping of CPU interfaces does not necessarily match the
>   * logical CPU numbering. Let's use mapping as returned by the GIC
>   * itself
> @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d)
>  
>  }
>  
> -static void gic_irq_eoi(void *info)
> -{
> -    int virq = (uintptr_t) info;
> -    GICC[GICC_DIR] = virq;
> -}
> -
>  static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
>  {
>      int i = 0, virq, pirq = -1;
> @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
>              if ( cpu == smp_processor_id() )
>                  gic_irq_eoi((void*)(uintptr_t)pirq);
>              else
> -                on_selected_cpus(cpumask_of(cpu),
> -                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
> +            {
> +                ASSERT(this_cpu(eoi_irq) == NULL);
> +                this_cpu(eoi_irq) = p;
> +                tasklet_schedule(&eoi_tasklet);
> +            }
>          }
>  
>          i++;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-28 16:12             ` Ian Campbell
@ 2014-01-28 16:23               ` Stefano Stabellini
  0 siblings, 0 replies; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-28 16:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Oleksandr Tyshchenko, xen-devel, Stefano Stabellini

On Tue, 28 Jan 2014, Ian Campbell wrote:
> On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote:
> > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > > > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > > > > but I am a bit wary of making that change at RC2.
> > > > > 
> > > > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > > > primitives to work around this issue on a case by case basis. It's just
> > > > > too subtle IMHO.
> > > > > 
> > > > > The IPI and cross CPU calling primitives are basically predicated on
> > > > > those IPIs interrupting normal interrupt handlers.
> > > > 
> > > > The problem is that we don't know if we can context switch properly
> > > > nested interrupts.
> > > 
> > > What do you mean? We don't have to context switch an IPI.
> > 
> > Sorry, I meant save/restore registers, stack pointer, processor mode,
> > etc, for nested interrupts.
> 
> Right, we do handle that actually since it is the same code as handles
> an IRQ during a hypercall (since a hypercall is just another type of
> trap).

Ah right, good.


> > > > Also I would need to think harder whether everything
> > > > would work correctly without hitches with multiple SGIs happening
> > > > simultaneously (with more than 2 cpus involved).
> > > 
> > > Since all IPIs would be at the same higher priority only one will be
> > > active on each CPU at a time. If you are worried about multiple CPUs
> > > then that is already an issue today, just at a lower priority.
> > 
> > That is correct, but if we moved the on_selected_cpus call out of the
> > interrupt handler I don't think the problem could occur.
> 
> Sure, I'm not opposed to fixing this issue by making an architectural
> improvement to the code which happens to not use on_selected_cpus.
> 
> > > AIUI this issue only occurs with "proto device assignment" patches added
> > > to 4.4, n which case I think the solution can wait until 4.5 and can be
> > > done properly via the IPI priority fix.
> >  
> > I think this is a pretty significant problem, even if we don't commit a
> > fix, we should post a proper patch that we deem acceptable and link it to
> > the wiki so that anybody that needs it can find it and be sure that it
> > works correctly.
> 
> FWIW I have an almost fix to the IPI priority thing which I would intend
> to fix in 4.5 regardless of whether this issue needs it or not.

Well, in that case if you can test the patch in the specific case where
an SGI interrupts a lower priority interrupt, it might be worth sending
out the patch now.


> > In my opinion if we go to this length then we might as well commit it
> > (if it doesn't touch common code of course), but I'll leave the decision
> > up to you and George.
> > 
> > Given the constraints, the solution I would feel more comfortable with at
> > this time is something like the following patch (lightly tested):
> 
> I don't think this buys you anything over just fixing on_selected_cpus
> to work as it should, unless there is some reason why deferring this
> work to a tasklet is the logically correct thing to do and/or a better
> designh?
> 
> (IOW if you are only doing this to avoid calling on_selected_cpus in
> interrupt context then I think this is the wrong fix).

I feel that on_selected_cpus is the kind of work that belongs to the
"bottom half". I was never very happy to have the call where it is now
in the first place. This approach would have the benefit of allowing us
to receive regular interrupts while waiting for other cpus to handle the
SGI. Also I am sure that it would work even in cases where you have
more than one SGIs targeting the same cpu simultaneously.


> Ian.
> 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index e6257a7..b00ca73 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >  
> >  static unsigned nr_lrs;
> >  
> > +static void gic_irq_eoi(void *info)
> > +{
> > +    int virq = (uintptr_t) info;
> > +    GICC[GICC_DIR] = virq;
> > +}
> > +
> > +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
> > +static void eoi_action(unsigned long unused)
> > +{
> > +    struct pending_irq *p = this_cpu(eoi_irq);
> > +    ASSERT(p != NULL);
> > +
> > +    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
> > +            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
> > +
> > +    this_cpu(eoi_irq) = NULL;
> > +}
> > +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
> > +
> >  /* The GIC mapping of CPU interfaces does not necessarily match the
> >   * logical CPU numbering. Let's use mapping as returned by the GIC
> >   * itself
> > @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d)
> >  
> >  }
> >  
> > -static void gic_irq_eoi(void *info)
> > -{
> > -    int virq = (uintptr_t) info;
> > -    GICC[GICC_DIR] = virq;
> > -}
> > -
> >  static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
> >  {
> >      int i = 0, virq, pirq = -1;
> > @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
> >              if ( cpu == smp_processor_id() )
> >                  gic_irq_eoi((void*)(uintptr_t)pirq);
> >              else
> > -                on_selected_cpus(cpumask_of(cpu),
> > -                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
> > +            {
> > +                ASSERT(this_cpu(eoi_irq) == NULL);
> > +                this_cpu(eoi_irq) = p;
> > +                tasklet_schedule(&eoi_tasklet);
> > +            }
> >          }
> >  
> >          i++;
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-27 17:33 ` [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function Oleksandr Tyshchenko
  2014-01-27 19:00   ` Stefano Stabellini
@ 2014-01-28 13:58   ` Stefano Stabellini
  2014-01-30 11:58     ` Oleksandr Tyshchenko
  1 sibling, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-28 13:58 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel

On Mon, 27 Jan 2014, Oleksandr Tyshchenko wrote:
> This patch is needed to avoid possible deadlocks in case of simultaneous
> occurrence cross-interrupts.
> 
> Change-Id: I574b496442253a7b67a27e2edd793526c8131284
> Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> ---
>  xen/common/smp.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/smp.c b/xen/common/smp.c
> index 2700bd7..46d2fc6 100644
> --- a/xen/common/smp.c
> +++ b/xen/common/smp.c
> @@ -55,7 +55,11 @@ void on_selected_cpus(
>  
>      ASSERT(local_irq_is_enabled());
>  
> -    spin_lock(&call_lock);
> +    if (!spin_trylock(&call_lock)) {
> +        if (smp_call_function_interrupt())
> +            return;

If smp_call_function_interrupt returns -EPERM, shouldn't we go back to
spinning on call_lock?
Also there is a race condition between spin_lock, cpumask_copy and
smp_call_function_interrupt: smp_call_function_interrupt could be called
on cpu1 after cpu0 acquired the lock, but before cpu0 set
call_data.selected.

I think the correct implemention would be:


while ( unlikely(!spin_trylock(&call_lock)) )
    smp_call_function_interrupt();



> +        spin_lock(&call_lock);
> +    }
>  
>      cpumask_copy(&call_data.selected, selected);
>  
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
  2014-01-28 13:58   ` Stefano Stabellini
@ 2014-01-30 11:58     ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-30 11:58 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

On Tue, Jan 28, 2014 at 3:58 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 27 Jan 2014, Oleksandr Tyshchenko wrote:
>> This patch is needed to avoid possible deadlocks in case of simultaneous
>> occurrence cross-interrupts.
>>
>> Change-Id: I574b496442253a7b67a27e2edd793526c8131284
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
>> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> ---
>>  xen/common/smp.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/common/smp.c b/xen/common/smp.c
>> index 2700bd7..46d2fc6 100644
>> --- a/xen/common/smp.c
>> +++ b/xen/common/smp.c
>> @@ -55,7 +55,11 @@ void on_selected_cpus(
>>
>>      ASSERT(local_irq_is_enabled());
>>
>> -    spin_lock(&call_lock);
>> +    if (!spin_trylock(&call_lock)) {
>> +        if (smp_call_function_interrupt())
>> +            return;
>
> If smp_call_function_interrupt returns -EPERM, shouldn't we go back to
> spinning on call_lock?
> Also there is a race condition between spin_lock, cpumask_copy and
> smp_call_function_interrupt: smp_call_function_interrupt could be called
> on cpu1 after cpu0 acquired the lock, but before cpu0 set
> call_data.selected.
>
> I think the correct implemention would be:
>
>
> while ( unlikely(!spin_trylock(&call_lock)) )
>     smp_call_function_interrupt();
I completely agree
>
>
>
>> +        spin_lock(&call_lock);
>> +    }
>>
>>      cpumask_copy(&call_data.selected, selected);
>>
>> --
>> 1.7.9.5
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>

-- 

Oleksandr Tyshchenko | Embedded Developer
GlobalLogic

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-27 17:33 [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Oleksandr Tyshchenko
  2014-01-27 17:33 ` [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function Oleksandr Tyshchenko
  2014-01-27 17:33 ` [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function Oleksandr Tyshchenko
@ 2014-01-27 17:40 ` Ian Campbell
  2014-01-27 17:51 ` Julien Grall
  3 siblings, 0 replies; 48+ messages in thread
From: Ian Campbell @ 2014-01-27 17:40 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel

On Mon, 2014-01-27 at 19:33 +0200, Oleksandr Tyshchenko wrote:
> But I am afraid this solution not completed and maybe not enough for
> stable work. I would appreciate if you could explain me how to solve
> the issue in a right way or give some advices. 

I'll look into this properly tomorrow, but my initial reaction is that
this should be fixed by making those SGIs which are used as IPIs have
higher priority in the GIC than normal interrupts.

Ian.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-27 17:33 [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  2014-01-27 17:40 ` [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Ian Campbell
@ 2014-01-27 17:51 ` Julien Grall
  2014-01-28 19:25   ` Oleksandr Tyshchenko
  3 siblings, 1 reply; 48+ messages in thread
From: Julien Grall @ 2014-01-27 17:51 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

On 01/27/2014 05:33 PM, Oleksandr Tyshchenko wrote:
> Hi, all.

Hello Oleksandr,

> We are trying to bringing up XEN on DRA7XX (OMAP5) platform.
> 
> We sometimes see some hangs in Hypervisor and these hangs are related to SMP.
> We found out that deadlock took place in on_selected_cpus function
> in case of simultaneous occurrence cross-interrupts.
> 
> The issue:
> 
> 1. We receive irqs from first CPU (for example CPU0) and second CPU (for example CPU1) in parallel.
> 2. In our case the maintenance_interrupt function for maintenance irq from CPU0 is executed on CPU1 and
> maintenance_interrupt for irq from CPU1 is executed on CPU0.
> 3. According to existing logic we have run gic_irq_eoi function on CPU which it was scheduled.

gic_irq_eoi is only called for physical IRQ routed to the guest (eg:
hard drive, network, ...). As far as I remember, these IRQs are only
routed to CPU0. Do you pass-through PPIs to dom0?

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-27 17:51 ` Julien Grall
@ 2014-01-28 19:25   ` Oleksandr Tyshchenko
  2014-01-29 10:56     ` Oleksandr Tyshchenko
  2014-01-29 13:12     ` Julien Grall
  0 siblings, 2 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-28 19:25 UTC (permalink / raw)
  To: Julien Grall; +Cc: Stefano Stabellini, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 15348 bytes --]

Hello Julien,

Please see inline

gic_irq_eoi is only called for physical IRQ routed to the guest (eg:
> hard drive, network, ...). As far as I remember, these IRQs are only
> routed to CPU0.


I understand.

But, I have created debug patch to show the issue:

diff --git a/xen/common/smp.c b/xen/common/smp.c
index 46d2fc6..6123561 100644
--- a/xen/common/smp.c
+++ b/xen/common/smp.c
@@ -22,6 +22,8 @@
 #include <xen/smp.h>
 #include <xen/errno.h>

+int locked = 0;
+
 /*
  * Structure and data for smp_call_function()/on_selected_cpus().
  */
@@ -53,11 +55,19 @@ void on_selected_cpus(
 {
     unsigned int nr_cpus;

+    locked = 0;
+
     ASSERT(local_irq_is_enabled());

     if (!spin_trylock(&call_lock)) {
+
+    locked = 1;
+        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
%08lx\n", __func__, __LINE__,
+                 cpumask_of(smp_processor_id())->bits[0],
selected->bits[0]);
+
         if (smp_call_function_interrupt())
             return;
+
         spin_lock(&call_lock);
     }

@@ -78,6 +88,10 @@ void on_selected_cpus(

 out:
     spin_unlock(&call_lock);
+
+    if (locked)
+        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
%08lx\n", __func__, __LINE__,
+            cpumask_of(smp_processor_id())->bits[0], selected->bits[0]);
 }

 int smp_call_function_interrupt(void)
@@ -86,6 +100,10 @@ int smp_call_function_interrupt(void)
     void *info = call_data.info;
     unsigned int cpu = smp_processor_id();

+     if (locked)
+        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
%08lx\n", __func__, __LINE__,
+            cpumask_of(smp_processor_id())->bits[0],
call_data.selected.bits[0]);
+
     if ( !cpumask_test_cpu(cpu, &call_data.selected) )
         return -EPERM;

Our issue (simultaneous cross-interrupts) has occurred during boot domU:

[    7.507812] oom_adj 2 => oom_score_adj 117
[    7.507812] oom_adj 4 => oom_score_adj 235
[    7.507812] oom_adj 9 => oom_score_adj 529
[    7.507812] oom_adj 15 => oom_score_adj 1000
[    8.835937] PVR_K:(Error): PVRSRVOpenDCDeviceKM: no devnode matching
index 0 [0, ]
(XEN)
(XEN) >>>>> on_selected_cpus: line: 65, cpu_mask_curr: 00000002,
cpu_mask_sel: 00000001
(XEN)
(XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr:
00000002, cpu_mask_sel: 00000002
(XEN)
(XEN) >>>>> on_selected_cpus: line: 93, cpu_mask_curr: 00000001,
cpu_mask_sel: 00000002
(XEN)
(XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr:
00000001, cpu_mask_sel: 00000001
(XEN)
(XEN) >>>>> on_selected_cpus: line: 93, cpu_mask_curr: 00000002,
cpu_mask_sel: 00000001
(XEN)
(XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr:
00000002, cpu_mask_sel: 00000000
[   11.023437] usbcore: registered new interface driver usbfs
[   11.023437] usbcore: registered new interface driver hub
[   11.023437] usbcore: registered new device driver usb
[   11.039062] usbcore: registered new interface driver usbhid
[   11.039062] usbhid: USB HID core driver


> Do you pass-through PPIs to dom0?
>

If I understand correctly that PPIs is irqs from 16 to 31.
So yes, I do. I see timer's irqs and maintenance irq which routed to both
CPUs.

And I have printed all irqs which fall to gic_route_irq_to_guest and
gic_route_irq functions.
...
(XEN) GIC initialization:
(XEN)         gic_dist_addr=0000000048211000
(XEN)         gic_cpu_addr=0000000048212000
(XEN)         gic_hyp_addr=0000000048214000
(XEN)         gic_vcpu_addr=0000000048216000
(XEN)         gic_maintenance_irq=25
(XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
(XEN)
(XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000001
(XEN)
(XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000001
(XEN)
(XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000001
(XEN)
(XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000001
(XEN)
(XEN) >>>>> gic_route_irq: irq: 104, cpu_mask: 00000001
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Allocated console ring of 16 KiB.
(XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
(XEN) Bringing up CPU1
(XEN)
(XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000002
(XEN)
(XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000002
(XEN)
(XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000002
(XEN)
(XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000002
(XEN) CPU 1 booted.
(XEN) Brought up 2 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 61, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 62, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 63, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 64, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 66, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 67, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 153, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 105, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 106, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 102, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 137, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 138, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 113, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 69, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 70, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 71, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 72, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 73, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 74, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 75, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 76, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 77, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 78, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 79, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 112, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 145, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 158, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 86, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 82, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 83, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 84, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 85, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 187, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 186, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 188, cpu: 0
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 189, cpu: 0
(XEN) Loading kernel from boot module 2
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 57, cpu: 0
(XEN) Loading zImage from 00000000c0000040 to
00000000c8008000-00000000c8304eb0
(XEN) Loading dom0 DTB to 0x00000000cfe00000-0x00000000cfe03978
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input
to Xen)
(XEN) Freed 252kB init memory.
[    0.000000] /cpus/cpu@0 missing clock-frequency property
[    0.000000] /cpus/cpu@1 missing clock-frequency property
[    0.093750] omap_l3_noc ocp.2: couldn't find resource 2
[    0.265625] ahci ahci.0.auto: can't get clock
[    0.867187] Freeing init memory: 224K
Parsing config from /xen/images/DomUAndroid.cfg
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 105, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 61, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 62, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 63, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 64, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 65, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 66, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 67, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 153, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 69, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 70, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 71, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 72, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 73, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 74, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 75, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 76, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 77, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 78, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 79, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 102, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 137, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 138, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 88, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 89, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 93, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 94, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 92, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 152, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 97, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 98, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 123, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 80, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 115, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 118, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 126, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 128, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 91, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 41, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 42, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 48, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 131, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 44, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 45, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 46, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 47, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 40, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 158, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 146, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 60, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 85, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 87, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 133, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 142, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 143, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 53, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 164, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 51, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 134, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 50, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 108, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 109, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 124, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 125, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 110, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 112, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 68, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 101, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 99, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 100, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 103, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 132, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 56, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 135, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 136, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 139, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 58, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 140, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 141, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 49, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 54, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 55, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 144, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 32, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 33, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 34, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 35, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 36, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 39, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 43, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 52, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 59, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 120, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 90, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 107, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 119, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 121, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 122, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 129, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 130, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 151, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 154, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 155, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 156, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 160, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 162, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 163, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 157, cpu: 1
(XEN)
(XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 173, cpu: 1
Daemon running with PID 569
...

>
> --
> Julien Grall
>



-- 

Name | Title
GlobalLogic
P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
www.globallogic.com
 <http://www.globallogic.com/>
http://www.globallogic.com/email_disclaimer.txt

[-- Attachment #1.2: Type: text/html, Size: 22465 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-28 19:25   ` Oleksandr Tyshchenko
@ 2014-01-29 10:56     ` Oleksandr Tyshchenko
  2014-01-29 11:42       ` Stefano Stabellini
  2014-01-29 13:07       ` [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Julien Grall
  2014-01-29 13:12     ` Julien Grall
  1 sibling, 2 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-29 10:56 UTC (permalink / raw)
  To: Julien Grall; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

Hello all,

I just recollected about one hack which we created
as we needed to route HW IRQ in domU.

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 9d793ba..d0227b9 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -989,8 +989,6 @@ static void domcreate_launch_dm(libxl__egc *egc,
libxl__multidev *multidev,

         LOG(DEBUG, "dom%d irq %d", domid, irq);

-        ret = irq >= 0 ? xc_physdev_map_pirq(CTX->xch, domid, irq, &irq)
-                       : -EOVERFLOW;
         if (!ret)
             ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
         if (ret < 0) {
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 2e4b11f..b54c08e 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -85,7 +85,7 @@ int domain_vgic_init(struct domain *d)
     if ( d->domain_id == 0 )
         d->arch.vgic.nr_lines = gic_number_lines() - 32;
     else
-        d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest */
+        d->arch.vgic.nr_lines = gic_number_lines() - 32; /* We do
need SPIs for the guest */

     d->arch.vgic.shared_irqs =
         xzalloc_array(struct vgic_irq_rank, DOMAIN_NR_RANKS(d));
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 75e2df3..ba88901 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -29,6 +29,7 @@
 #include <asm/page.h>
 #include <public/domctl.h>
 #include <xsm/xsm.h>
+#include <asm/gic.h>

 static DEFINE_SPINLOCK(domctl_lock);
 DEFINE_SPINLOCK(vcpu_alloc_lock);
@@ -782,8 +783,11 @@ long
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             ret = -EINVAL;
         else if ( xsm_irq_permission(XSM_HOOK, d, pirq, allow) )
             ret = -EPERM;
-        else if ( allow )
-            ret = pirq_permit_access(d, pirq);
+        else if ( allow ) {
+            struct dt_irq irq = {pirq + NR_LOCAL_IRQS,0};
+            ret = pirq_permit_access(d, irq.irq);
+            gic_route_irq_to_guest(d, &irq, "");
+        }
         else
             ret = pirq_deny_access(d, pirq);
     }
(END)

It seems, the following patch can violate the logic about routing
physical IRQs only to CPU0.
In gic_route_irq_to_guest() we need to call gic_set_irq_properties()
where the one of the parameters is cpumask_of(smp_processor_id()).
But in this part of code this function can be executed on CPU1. And as
result this can cause to the fact that the wrong value would set to
target CPU mask.

Please, confirm my assumption.
If I am right we have to add a basic HW IRQ routing to DomU in a right way.

On Tue, Jan 28, 2014 at 9:25 PM, Oleksandr Tyshchenko
<oleksandr.tyshchenko@globallogic.com> wrote:
> Hello Julien,
>
> Please see inline
>
>> gic_irq_eoi is only called for physical IRQ routed to the guest (eg:
>> hard drive, network, ...). As far as I remember, these IRQs are only
>> routed to CPU0.
>
>
> I understand.
>
> But, I have created debug patch to show the issue:
>
> diff --git a/xen/common/smp.c b/xen/common/smp.c
> index 46d2fc6..6123561 100644
> --- a/xen/common/smp.c
> +++ b/xen/common/smp.c
> @@ -22,6 +22,8 @@
>  #include <xen/smp.h>
>  #include <xen/errno.h>
>
> +int locked = 0;
> +
>  /*
>   * Structure and data for smp_call_function()/on_selected_cpus().
>   */
> @@ -53,11 +55,19 @@ void on_selected_cpus(
>  {
>      unsigned int nr_cpus;
>
> +    locked = 0;
> +
>      ASSERT(local_irq_is_enabled());
>
>      if (!spin_trylock(&call_lock)) {
> +
> +    locked = 1;
> +        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
> %08lx\n", __func__, __LINE__,
> +                 cpumask_of(smp_processor_id())->bits[0],
> selected->bits[0]);
> +
>          if (smp_call_function_interrupt())
>              return;
> +
>          spin_lock(&call_lock);
>      }
>
> @@ -78,6 +88,10 @@ void on_selected_cpus(
>
>  out:
>      spin_unlock(&call_lock);
> +
> +    if (locked)
> +        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
> %08lx\n", __func__, __LINE__,
> +            cpumask_of(smp_processor_id())->bits[0], selected->bits[0]);
>  }
>
>  int smp_call_function_interrupt(void)
> @@ -86,6 +100,10 @@ int smp_call_function_interrupt(void)
>      void *info = call_data.info;
>      unsigned int cpu = smp_processor_id();
>
> +     if (locked)
> +        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
> %08lx\n", __func__, __LINE__,
> +            cpumask_of(smp_processor_id())->bits[0],
> call_data.selected.bits[0]);
> +
>      if ( !cpumask_test_cpu(cpu, &call_data.selected) )
>          return -EPERM;
>
> Our issue (simultaneous cross-interrupts) has occurred during boot domU:
>
> [    7.507812] oom_adj 2 => oom_score_adj 117
> [    7.507812] oom_adj 4 => oom_score_adj 235
> [    7.507812] oom_adj 9 => oom_score_adj 529
> [    7.507812] oom_adj 15 => oom_score_adj 1000
> [    8.835937] PVR_K:(Error): PVRSRVOpenDCDeviceKM: no devnode matching
> index 0 [0, ]
> (XEN)
> (XEN) >>>>> on_selected_cpus: line: 65, cpu_mask_curr: 00000002,
> cpu_mask_sel: 00000001
> (XEN)
> (XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr: 00000002,
> cpu_mask_sel: 00000002
> (XEN)
> (XEN) >>>>> on_selected_cpus: line: 93, cpu_mask_curr: 00000001,
> cpu_mask_sel: 00000002
> (XEN)
> (XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr: 00000001,
> cpu_mask_sel: 00000001
> (XEN)
> (XEN) >>>>> on_selected_cpus: line: 93, cpu_mask_curr: 00000002,
> cpu_mask_sel: 00000001
> (XEN)
> (XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr: 00000002,
> cpu_mask_sel: 00000000
> [   11.023437] usbcore: registered new interface driver usbfs
> [   11.023437] usbcore: registered new interface driver hub
> [   11.023437] usbcore: registered new device driver usb
> [   11.039062] usbcore: registered new interface driver usbhid
> [   11.039062] usbhid: USB HID core driver
>
>>
>> Do you pass-through PPIs to dom0?
>
>
> If I understand correctly that PPIs is irqs from 16 to 31.
> So yes, I do. I see timer's irqs and maintenance irq which routed to both
> CPUs.
>
> And I have printed all irqs which fall to gic_route_irq_to_guest and
> gic_route_irq functions.
> ...
> (XEN) GIC initialization:
> (XEN)         gic_dist_addr=0000000048211000
> (XEN)         gic_cpu_addr=0000000048212000
> (XEN)         gic_hyp_addr=0000000048214000
> (XEN)         gic_vcpu_addr=0000000048216000
> (XEN)         gic_maintenance_irq=25
> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 104, cpu_mask: 00000001
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Allocated console ring of 16 KiB.
> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> (XEN) Bringing up CPU1
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000002
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000002
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000002
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000002
> (XEN) CPU 1 booted.
> (XEN) Brought up 2 CPUs
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 61, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 62, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 63, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 64, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 66, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 67, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 153, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 105, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 106, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 102, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 137, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 138, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 113, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 69, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 70, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 71, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 72, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 73, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 74, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 75, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 76, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 77, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 78, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 79, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 112, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 145, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 158, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 86, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 82, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 83, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 84, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 85, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 187, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 186, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 188, cpu: 0
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 189, cpu: 0
> (XEN) Loading kernel from boot module 2
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 57, cpu: 0
> (XEN) Loading zImage from 00000000c0000040 to
> 00000000c8008000-00000000c8304eb0
> (XEN) Loading dom0 DTB to 0x00000000cfe00000-0x00000000cfe03978
> (XEN) Std. Loglevel: All
> (XEN) Guest Loglevel: All
> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
> Xen)
> (XEN) Freed 252kB init memory.
> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> [    0.093750] omap_l3_noc ocp.2: couldn't find resource 2
> [    0.265625] ahci ahci.0.auto: can't get clock
> [    0.867187] Freeing init memory: 224K
> Parsing config from /xen/images/DomUAndroid.cfg
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 105, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 61, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 62, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 63, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 64, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 65, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 66, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 67, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 153, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 69, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 70, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 71, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 72, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 73, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 74, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 75, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 76, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 77, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 78, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 79, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 102, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 137, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 138, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 88, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 89, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 93, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 94, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 92, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 152, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 97, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 98, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 123, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 80, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 115, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 118, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 126, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 128, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 91, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 41, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 42, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 48, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 131, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 44, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 45, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 46, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 47, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 40, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 158, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 146, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 60, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 85, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 87, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 133, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 142, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 143, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 53, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 164, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 51, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 134, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 50, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 108, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 109, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 124, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 125, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 110, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 112, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 68, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 101, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 99, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 100, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 103, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 132, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 56, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 135, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 136, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 139, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 58, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 140, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 141, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 49, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 54, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 55, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 144, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 32, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 33, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 34, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 35, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 36, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 39, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 43, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 52, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 59, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 120, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 90, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 107, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 119, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 121, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 122, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 129, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 130, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 151, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 154, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 155, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 156, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 160, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 162, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 163, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 157, cpu: 1
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 173, cpu: 1
> Daemon running with PID 569
> ...
>>
>>
>> --
>> Julien Grall
>
>
>
>
> --
>
> Name | Title
> GlobalLogic
> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
> www.globallogic.com
>
> http://www.globallogic.com/email_disclaimer.txt



-- 

Name | Title
GlobalLogic
P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 10:56     ` Oleksandr Tyshchenko
@ 2014-01-29 11:42       ` Stefano Stabellini
  2014-01-29 11:46         ` Stefano Stabellini
  2014-02-04 16:20         ` [PATCH] xen/arm: route irqs to cpu0 Stefano Stabellini
  2014-01-29 13:07       ` [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Julien Grall
  1 sibling, 2 replies; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-29 11:42 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 29 Jan 2014, Oleksandr Tyshchenko wrote:
> Hello all,
> 
> I just recollected about one hack which we created
> as we needed to route HW IRQ in domU.
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 9d793ba..d0227b9 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -989,8 +989,6 @@ static void domcreate_launch_dm(libxl__egc *egc,
> libxl__multidev *multidev,
> 
>          LOG(DEBUG, "dom%d irq %d", domid, irq);
> 
> -        ret = irq >= 0 ? xc_physdev_map_pirq(CTX->xch, domid, irq, &irq)
> -                       : -EOVERFLOW;
>          if (!ret)
>              ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
>          if (ret < 0) {
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 2e4b11f..b54c08e 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -85,7 +85,7 @@ int domain_vgic_init(struct domain *d)
>      if ( d->domain_id == 0 )
>          d->arch.vgic.nr_lines = gic_number_lines() - 32;
>      else
> -        d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest */
> +        d->arch.vgic.nr_lines = gic_number_lines() - 32; /* We do
> need SPIs for the guest */
> 
>      d->arch.vgic.shared_irqs =
>          xzalloc_array(struct vgic_irq_rank, DOMAIN_NR_RANKS(d));
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 75e2df3..ba88901 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -29,6 +29,7 @@
>  #include <asm/page.h>
>  #include <public/domctl.h>
>  #include <xsm/xsm.h>
> +#include <asm/gic.h>
> 
>  static DEFINE_SPINLOCK(domctl_lock);
>  DEFINE_SPINLOCK(vcpu_alloc_lock);
> @@ -782,8 +783,11 @@ long
> do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>              ret = -EINVAL;
>          else if ( xsm_irq_permission(XSM_HOOK, d, pirq, allow) )
>              ret = -EPERM;
> -        else if ( allow )
> -            ret = pirq_permit_access(d, pirq);
> +        else if ( allow ) {
> +            struct dt_irq irq = {pirq + NR_LOCAL_IRQS,0};
> +            ret = pirq_permit_access(d, irq.irq);
> +            gic_route_irq_to_guest(d, &irq, "");
> +        }
>          else
>              ret = pirq_deny_access(d, pirq);
>      }
> (END)
> 
> It seems, the following patch can violate the logic about routing
> physical IRQs only to CPU0.
> In gic_route_irq_to_guest() we need to call gic_set_irq_properties()
> where the one of the parameters is cpumask_of(smp_processor_id()).
> But in this part of code this function can be executed on CPU1. And as
> result this can cause to the fact that the wrong value would set to
> target CPU mask.
> 
> Please, confirm my assumption.

That is correct.


> If I am right we have to add a basic HW IRQ routing to DomU in a right way.

We could add the cpumask parameter to gic_route_irq_to_guest. Or maybe
for now we could just hardcode the cpumask of cpu0
gic_route_irq_to_guest.

However keep in mind that if you plan on routing SPIs to guests other
than dom0, receiving all the interrupts on cpu0 might not be great for
performances.

It is impressive how small is this patch, if this is all is needed to
get irq routing to guests working.



> On Tue, Jan 28, 2014 at 9:25 PM, Oleksandr Tyshchenko
> <oleksandr.tyshchenko@globallogic.com> wrote:
> > Hello Julien,
> >
> > Please see inline
> >
> >> gic_irq_eoi is only called for physical IRQ routed to the guest (eg:
> >> hard drive, network, ...). As far as I remember, these IRQs are only
> >> routed to CPU0.
> >
> >
> > I understand.
> >
> > But, I have created debug patch to show the issue:
> >
> > diff --git a/xen/common/smp.c b/xen/common/smp.c
> > index 46d2fc6..6123561 100644
> > --- a/xen/common/smp.c
> > +++ b/xen/common/smp.c
> > @@ -22,6 +22,8 @@
> >  #include <xen/smp.h>
> >  #include <xen/errno.h>
> >
> > +int locked = 0;
> > +
> >  /*
> >   * Structure and data for smp_call_function()/on_selected_cpus().
> >   */
> > @@ -53,11 +55,19 @@ void on_selected_cpus(
> >  {
> >      unsigned int nr_cpus;
> >
> > +    locked = 0;
> > +
> >      ASSERT(local_irq_is_enabled());
> >
> >      if (!spin_trylock(&call_lock)) {
> > +
> > +    locked = 1;
> > +        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
> > %08lx\n", __func__, __LINE__,
> > +                 cpumask_of(smp_processor_id())->bits[0],
> > selected->bits[0]);
> > +
> >          if (smp_call_function_interrupt())
> >              return;
> > +
> >          spin_lock(&call_lock);
> >      }
> >
> > @@ -78,6 +88,10 @@ void on_selected_cpus(
> >
> >  out:
> >      spin_unlock(&call_lock);
> > +
> > +    if (locked)
> > +        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
> > %08lx\n", __func__, __LINE__,
> > +            cpumask_of(smp_processor_id())->bits[0], selected->bits[0]);
> >  }
> >
> >  int smp_call_function_interrupt(void)
> > @@ -86,6 +100,10 @@ int smp_call_function_interrupt(void)
> >      void *info = call_data.info;
> >      unsigned int cpu = smp_processor_id();
> >
> > +     if (locked)
> > +        printk("\n>>>>> %s: line: %d, cpu_mask_curr: %08lx, cpu_mask_sel:
> > %08lx\n", __func__, __LINE__,
> > +            cpumask_of(smp_processor_id())->bits[0],
> > call_data.selected.bits[0]);
> > +
> >      if ( !cpumask_test_cpu(cpu, &call_data.selected) )
> >          return -EPERM;
> >
> > Our issue (simultaneous cross-interrupts) has occurred during boot domU:
> >
> > [    7.507812] oom_adj 2 => oom_score_adj 117
> > [    7.507812] oom_adj 4 => oom_score_adj 235
> > [    7.507812] oom_adj 9 => oom_score_adj 529
> > [    7.507812] oom_adj 15 => oom_score_adj 1000
> > [    8.835937] PVR_K:(Error): PVRSRVOpenDCDeviceKM: no devnode matching
> > index 0 [0, ]
> > (XEN)
> > (XEN) >>>>> on_selected_cpus: line: 65, cpu_mask_curr: 00000002,
> > cpu_mask_sel: 00000001
> > (XEN)
> > (XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr: 00000002,
> > cpu_mask_sel: 00000002
> > (XEN)
> > (XEN) >>>>> on_selected_cpus: line: 93, cpu_mask_curr: 00000001,
> > cpu_mask_sel: 00000002
> > (XEN)
> > (XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr: 00000001,
> > cpu_mask_sel: 00000001
> > (XEN)
> > (XEN) >>>>> on_selected_cpus: line: 93, cpu_mask_curr: 00000002,
> > cpu_mask_sel: 00000001
> > (XEN)
> > (XEN) >>>>> smp_call_function_interrupt: line: 104, cpu_mask_curr: 00000002,
> > cpu_mask_sel: 00000000
> > [   11.023437] usbcore: registered new interface driver usbfs
> > [   11.023437] usbcore: registered new interface driver hub
> > [   11.023437] usbcore: registered new device driver usb
> > [   11.039062] usbcore: registered new interface driver usbhid
> > [   11.039062] usbhid: USB HID core driver
> >
> >>
> >> Do you pass-through PPIs to dom0?
> >
> >
> > If I understand correctly that PPIs is irqs from 16 to 31.
> > So yes, I do. I see timer's irqs and maintenance irq which routed to both
> > CPUs.
> >
> > And I have printed all irqs which fall to gic_route_irq_to_guest and
> > gic_route_irq functions.
> > ...
> > (XEN) GIC initialization:
> > (XEN)         gic_dist_addr=0000000048211000
> > (XEN)         gic_cpu_addr=0000000048212000
> > (XEN)         gic_hyp_addr=0000000048214000
> > (XEN)         gic_vcpu_addr=0000000048216000
> > (XEN)         gic_maintenance_irq=25
> > (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000001
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000001
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000001
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000001
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 104, cpu_mask: 00000001
> > (XEN) Using scheduler: SMP Credit Scheduler (credit)
> > (XEN) Allocated console ring of 16 KiB.
> > (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> > (XEN) Bringing up CPU1
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000002
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000002
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000002
> > (XEN)
> > (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000002
> > (XEN) CPU 1 booted.
> > (XEN) Brought up 2 CPUs
> > (XEN) *** LOADING DOMAIN 0 ***
> > (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 61, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 62, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 63, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 64, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 66, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 67, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 153, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 105, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 106, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 102, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 137, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 138, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 113, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 69, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 70, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 71, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 72, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 73, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 74, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 75, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 76, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 77, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 78, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 79, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 112, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 145, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 158, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 86, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 82, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 83, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 84, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 85, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 187, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 186, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 188, cpu: 0
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 189, cpu: 0
> > (XEN) Loading kernel from boot module 2
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 57, cpu: 0
> > (XEN) Loading zImage from 00000000c0000040 to
> > 00000000c8008000-00000000c8304eb0
> > (XEN) Loading dom0 DTB to 0x00000000cfe00000-0x00000000cfe03978
> > (XEN) Std. Loglevel: All
> > (XEN) Guest Loglevel: All
> > (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
> > Xen)
> > (XEN) Freed 252kB init memory.
> > [    0.000000] /cpus/cpu@0 missing clock-frequency property
> > [    0.000000] /cpus/cpu@1 missing clock-frequency property
> > [    0.093750] omap_l3_noc ocp.2: couldn't find resource 2
> > [    0.265625] ahci ahci.0.auto: can't get clock
> > [    0.867187] Freeing init memory: 224K
> > Parsing config from /xen/images/DomUAndroid.cfg
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 105, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 61, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 62, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 63, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 64, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 65, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 66, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 67, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 153, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 69, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 70, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 71, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 72, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 73, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 74, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 75, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 76, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 77, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 78, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 79, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 102, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 137, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 138, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 88, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 89, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 93, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 94, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 92, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 152, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 97, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 98, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 123, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 80, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 115, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 118, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 126, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 128, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 91, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 41, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 42, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 48, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 131, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 44, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 45, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 46, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 47, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 40, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 158, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 146, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 60, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 85, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 87, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 133, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 142, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 143, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 53, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 164, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 51, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 134, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 50, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 108, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 109, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 124, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 125, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 110, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 112, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 68, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 101, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 99, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 100, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 103, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 132, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 56, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 135, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 136, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 139, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 58, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 140, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 141, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 49, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 54, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 55, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 144, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 32, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 33, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 34, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 35, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 36, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 39, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 43, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 52, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 59, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 120, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 90, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 107, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 119, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 121, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 122, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 129, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 130, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 151, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 154, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 155, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 156, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 160, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 162, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 163, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 157, cpu: 1
> > (XEN)
> > (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 173, cpu: 1
> > Daemon running with PID 569
> > ...
> >>
> >>
> >> --
> >> Julien Grall
> >
> >
> >
> >
> > --
> >
> > Name | Title
> > GlobalLogic
> > P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
> > www.globallogic.com
> >
> > http://www.globallogic.com/email_disclaimer.txt
> 
> 
> 
> -- 
> 
> Name | Title
> GlobalLogic
> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
> www.globallogic.com
> 
> http://www.globallogic.com/email_disclaimer.txt
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 11:42       ` Stefano Stabellini
@ 2014-01-29 11:46         ` Stefano Stabellini
  2014-01-29 13:15           ` Julien Grall
  2014-02-04 15:32           ` Julien Grall
  2014-02-04 16:20         ` [PATCH] xen/arm: route irqs to cpu0 Stefano Stabellini
  1 sibling, 2 replies; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-29 11:46 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, Julien Grall, Ian Campbell, xen-devel

On Wed, 29 Jan 2014, Stefano Stabellini wrote:
> On Wed, 29 Jan 2014, Oleksandr Tyshchenko wrote:
> > Hello all,
> > 
> > I just recollected about one hack which we created
> > as we needed to route HW IRQ in domU.
> > 
> > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > index 9d793ba..d0227b9 100644
> > --- a/tools/libxl/libxl_create.c
> > +++ b/tools/libxl/libxl_create.c
> > @@ -989,8 +989,6 @@ static void domcreate_launch_dm(libxl__egc *egc,
> > libxl__multidev *multidev,
> > 
> >          LOG(DEBUG, "dom%d irq %d", domid, irq);
> > 
> > -        ret = irq >= 0 ? xc_physdev_map_pirq(CTX->xch, domid, irq, &irq)
> > -                       : -EOVERFLOW;
> >          if (!ret)
> >              ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
> >          if (ret < 0) {
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 2e4b11f..b54c08e 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -85,7 +85,7 @@ int domain_vgic_init(struct domain *d)
> >      if ( d->domain_id == 0 )
> >          d->arch.vgic.nr_lines = gic_number_lines() - 32;
> >      else
> > -        d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest */
> > +        d->arch.vgic.nr_lines = gic_number_lines() - 32; /* We do
> > need SPIs for the guest */
> > 
> >      d->arch.vgic.shared_irqs =
> >          xzalloc_array(struct vgic_irq_rank, DOMAIN_NR_RANKS(d));
> > diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> > index 75e2df3..ba88901 100644
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -29,6 +29,7 @@
> >  #include <asm/page.h>
> >  #include <public/domctl.h>
> >  #include <xsm/xsm.h>
> > +#include <asm/gic.h>
> > 
> >  static DEFINE_SPINLOCK(domctl_lock);
> >  DEFINE_SPINLOCK(vcpu_alloc_lock);
> > @@ -782,8 +783,11 @@ long
> > do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >              ret = -EINVAL;
> >          else if ( xsm_irq_permission(XSM_HOOK, d, pirq, allow) )
> >              ret = -EPERM;
> > -        else if ( allow )
> > -            ret = pirq_permit_access(d, pirq);
> > +        else if ( allow ) {
> > +            struct dt_irq irq = {pirq + NR_LOCAL_IRQS,0};
> > +            ret = pirq_permit_access(d, irq.irq);
> > +            gic_route_irq_to_guest(d, &irq, "");
> > +        }
> >          else
> >              ret = pirq_deny_access(d, pirq);
> >      }
> > (END)
> > 
> > It seems, the following patch can violate the logic about routing
> > physical IRQs only to CPU0.
> > In gic_route_irq_to_guest() we need to call gic_set_irq_properties()
> > where the one of the parameters is cpumask_of(smp_processor_id()).
> > But in this part of code this function can be executed on CPU1. And as
> > result this can cause to the fact that the wrong value would set to
> > target CPU mask.
> > 
> > Please, confirm my assumption.
> 
> That is correct.
> 
> 
> > If I am right we have to add a basic HW IRQ routing to DomU in a right way.
> 
> We could add the cpumask parameter to gic_route_irq_to_guest. Or maybe
> for now we could just hardcode the cpumask of cpu0
> gic_route_irq_to_guest.
> 
> However keep in mind that if you plan on routing SPIs to guests other
> than dom0, receiving all the interrupts on cpu0 might not be great for
> performances.

Thinking twice about it, it might be the only acceptable change for 4.4.


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e6257a7..af96a31 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -776,8 +795,7 @@ int gic_route_irq_to_guest(struct domain *d, const struct dt_irq *irq,
 
     level = dt_irq_is_level_triggered(irq);
 
-    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
-                           0xa0);
+    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
 
     retval = __setup_irq(desc, irq->irq, action);
     if (retval) {

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 11:46         ` Stefano Stabellini
@ 2014-01-29 13:15           ` Julien Grall
  2014-02-04 15:32           ` Julien Grall
  1 sibling, 0 replies; 48+ messages in thread
From: Julien Grall @ 2014-01-29 13:15 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Oleksandr Tyshchenko, Ian Campbell, xen-devel



On 29/01/14 11:46, Stefano Stabellini wrote:
> On Wed, 29 Jan 2014, Stefano Stabellini wrote:
>> On Wed, 29 Jan 2014, Oleksandr Tyshchenko wrote:
>>> Hello all,
>>>
>>> I just recollected about one hack which we created
>>> as we needed to route HW IRQ in domU.
>>>
>>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>>> index 9d793ba..d0227b9 100644
>>> --- a/tools/libxl/libxl_create.c
>>> +++ b/tools/libxl/libxl_create.c
>>> @@ -989,8 +989,6 @@ static void domcreate_launch_dm(libxl__egc *egc,
>>> libxl__multidev *multidev,
>>>
>>>           LOG(DEBUG, "dom%d irq %d", domid, irq);
>>>
>>> -        ret = irq >= 0 ? xc_physdev_map_pirq(CTX->xch, domid, irq, &irq)
>>> -                       : -EOVERFLOW;
>>>           if (!ret)
>>>               ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
>>>           if (ret < 0) {
>>> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
>>> index 2e4b11f..b54c08e 100644
>>> --- a/xen/arch/arm/vgic.c
>>> +++ b/xen/arch/arm/vgic.c
>>> @@ -85,7 +85,7 @@ int domain_vgic_init(struct domain *d)
>>>       if ( d->domain_id == 0 )
>>>           d->arch.vgic.nr_lines = gic_number_lines() - 32;
>>>       else
>>> -        d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest */
>>> +        d->arch.vgic.nr_lines = gic_number_lines() - 32; /* We do
>>> need SPIs for the guest */
>>>
>>>       d->arch.vgic.shared_irqs =
>>>           xzalloc_array(struct vgic_irq_rank, DOMAIN_NR_RANKS(d));
>>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>>> index 75e2df3..ba88901 100644
>>> --- a/xen/common/domctl.c
>>> +++ b/xen/common/domctl.c
>>> @@ -29,6 +29,7 @@
>>>   #include <asm/page.h>
>>>   #include <public/domctl.h>
>>>   #include <xsm/xsm.h>
>>> +#include <asm/gic.h>
>>>
>>>   static DEFINE_SPINLOCK(domctl_lock);
>>>   DEFINE_SPINLOCK(vcpu_alloc_lock);
>>> @@ -782,8 +783,11 @@ long
>>> do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>>               ret = -EINVAL;
>>>           else if ( xsm_irq_permission(XSM_HOOK, d, pirq, allow) )
>>>               ret = -EPERM;
>>> -        else if ( allow )
>>> -            ret = pirq_permit_access(d, pirq);
>>> +        else if ( allow ) {
>>> +            struct dt_irq irq = {pirq + NR_LOCAL_IRQS,0};
>>> +            ret = pirq_permit_access(d, irq.irq);
>>> +            gic_route_irq_to_guest(d, &irq, "");
>>> +        }
>>>           else
>>>               ret = pirq_deny_access(d, pirq);
>>>       }
>>> (END)
>>>
>>> It seems, the following patch can violate the logic about routing
>>> physical IRQs only to CPU0.
>>> In gic_route_irq_to_guest() we need to call gic_set_irq_properties()
>>> where the one of the parameters is cpumask_of(smp_processor_id()).
>>> But in this part of code this function can be executed on CPU1. And as
>>> result this can cause to the fact that the wrong value would set to
>>> target CPU mask.
>>>
>>> Please, confirm my assumption.
>>
>> That is correct.
>>
>>
>>> If I am right we have to add a basic HW IRQ routing to DomU in a right way.
>>
>> We could add the cpumask parameter to gic_route_irq_to_guest. Or maybe
>> for now we could just hardcode the cpumask of cpu0
>> gic_route_irq_to_guest.
>>
>> However keep in mind that if you plan on routing SPIs to guests other
>> than dom0, receiving all the interrupts on cpu0 might not be great for
>> performances.
>
> Thinking twice about it, it might be the only acceptable change for 4.4.

In Xen upstream, gic_route_irq_to_guest is only called when the dom0 is 
built (so on CPU0). I don't think we need this patch for Xen 4.4.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 11:46         ` Stefano Stabellini
  2014-01-29 13:15           ` Julien Grall
@ 2014-02-04 15:32           ` Julien Grall
  1 sibling, 0 replies; 48+ messages in thread
From: Julien Grall @ 2014-02-04 15:32 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Oleksandr Tyshchenko, Ian Campbell, xen-devel

Hi Stefano,

On 01/29/2014 11:46 AM, Stefano Stabellini wrote:
> Thinking twice about it, it might be the only acceptable change for 4.4.

After thinking, if Ian's patch to prioritize the IPI
(http://www.gossamer-threads.com/lists/xen/devel/315342?do=post_view_threaded)
is not pushed for Xen 4.4, this patch might be useful for Oleksandr.

Can you send it with a commit message and signed-off?

> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e6257a7..af96a31 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -776,8 +795,7 @@ int gic_route_irq_to_guest(struct domain *d, const struct dt_irq *irq,
>  
>      level = dt_irq_is_level_triggered(irq);
>  
> -    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
> -                           0xa0);
> +    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);

I would add a TODO before the function and perhaps explain why...

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] xen/arm: route irqs to cpu0
  2014-01-29 11:42       ` Stefano Stabellini
  2014-01-29 11:46         ` Stefano Stabellini
@ 2014-02-04 16:20         ` Stefano Stabellini
  2014-02-04 16:32           ` Julien Grall
  2014-02-19 13:43           ` Julien Grall
  1 sibling, 2 replies; 48+ messages in thread
From: Stefano Stabellini @ 2014-02-04 16:20 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, George Dunlap, Julien Grall, Ian Campbell,
	xen-devel

gic_route_irq_to_guest routes all IRQs to
cpumask_of(smp_processor_id()), but actually it always called on cpu0.
To avoid confusion and possible issues in case someone modified the code
and reassigned a particular irq to a cpu other than cpu0, hardcode
cpumask_of(0).

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e6257a7..8854800 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -776,8 +776,8 @@ int gic_route_irq_to_guest(struct domain *d, const struct dt_irq *irq,
 
     level = dt_irq_is_level_triggered(irq);
 
-    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
-                           0xa0);
+    /* TODO: handle routing irqs to cpus != cpu0 */
+    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
 
     retval = __setup_irq(desc, irq->irq, action);
     if (retval) {

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-04 16:20         ` [PATCH] xen/arm: route irqs to cpu0 Stefano Stabellini
@ 2014-02-04 16:32           ` Julien Grall
  2014-02-04 16:56             ` Oleksandr Tyshchenko
  2014-02-19 13:43           ` Julien Grall
  1 sibling, 1 reply; 48+ messages in thread
From: Julien Grall @ 2014-02-04 16:32 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, George Dunlap, Ian Campbell, xen-devel

On 02/04/2014 04:20 PM, Stefano Stabellini wrote:
> gic_route_irq_to_guest routes all IRQs to
> cpumask_of(smp_processor_id()), but actually it always called on cpu0.
> To avoid confusion and possible issues in case someone modified the code
> and reassigned a particular irq to a cpu other than cpu0, hardcode
> cpumask_of(0).
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>

> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e6257a7..8854800 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -776,8 +776,8 @@ int gic_route_irq_to_guest(struct domain *d, const struct dt_irq *irq,
>  
>      level = dt_irq_is_level_triggered(irq);
>  
> -    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
> -                           0xa0);
> +    /* TODO: handle routing irqs to cpus != cpu0 */
> +    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
>  
>      retval = __setup_irq(desc, irq->irq, action);
>      if (retval) {
> 


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-04 16:32           ` Julien Grall
@ 2014-02-04 16:56             ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-02-04 16:56 UTC (permalink / raw)
  To: Julien Grall; +Cc: George Dunlap, xen-devel, Ian Campbell, Stefano Stabellini

Thanks. I need this patch.

On Tue, Feb 4, 2014 at 6:32 PM, Julien Grall <julien.grall@linaro.org> wrote:
> On 02/04/2014 04:20 PM, Stefano Stabellini wrote:
>> gic_route_irq_to_guest routes all IRQs to
>> cpumask_of(smp_processor_id()), but actually it always called on cpu0.
>> To avoid confusion and possible issues in case someone modified the code
>> and reassigned a particular irq to a cpu other than cpu0, hardcode
>> cpumask_of(0).
>>
>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Acked-by: Julien Grall <julien.grall@linaro.org>
>
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index e6257a7..8854800 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -776,8 +776,8 @@ int gic_route_irq_to_guest(struct domain *d, const struct dt_irq *irq,
>>
>>      level = dt_irq_is_level_triggered(irq);
>>
>> -    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
>> -                           0xa0);
>> +    /* TODO: handle routing irqs to cpus != cpu0 */
>> +    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
>>
>>      retval = __setup_irq(desc, irq->irq, action);
>>      if (retval) {
>>
>
>
> --
> Julien Grall



-- 

Name | Title
GlobalLogic
P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-04 16:20         ` [PATCH] xen/arm: route irqs to cpu0 Stefano Stabellini
  2014-02-04 16:32           ` Julien Grall
@ 2014-02-19 13:43           ` Julien Grall
  2014-02-19 13:53             ` Ian Campbell
  1 sibling, 1 reply; 48+ messages in thread
From: Julien Grall @ 2014-02-19 13:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, George Dunlap, Ian Campbell, xen-devel

Hi all,

Ping? It would be nice to have this patch for Xen 4.4 as IPI priority
patch won't be pushed before the release.

The patch is a minor change and won't impact normal use. When dom0 is
built, Xen always do it on CPU 0.

Regards,

On 02/04/2014 04:20 PM, Stefano Stabellini wrote:
> gic_route_irq_to_guest routes all IRQs to
> cpumask_of(smp_processor_id()), but actually it always called on cpu0.
> To avoid confusion and possible issues in case someone modified the code
> and reassigned a particular irq to a cpu other than cpu0, hardcode
> cpumask_of(0).
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e6257a7..8854800 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -776,8 +776,8 @@ int gic_route_irq_to_guest(struct domain *d, const struct dt_irq *irq,
>  
>      level = dt_irq_is_level_triggered(irq);
>  
> -    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
> -                           0xa0);
> +    /* TODO: handle routing irqs to cpus != cpu0 */
> +    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
>  
>      retval = __setup_irq(desc, irq->irq, action);
>      if (retval) {
> 




-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-19 13:43           ` Julien Grall
@ 2014-02-19 13:53             ` Ian Campbell
  2014-02-19 14:15               ` George Dunlap
  0 siblings, 1 reply; 48+ messages in thread
From: Ian Campbell @ 2014-02-19 13:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr Tyshchenko, xen-devel, George Dunlap,
	Stefano Stabellini

On Wed, 2014-02-19 at 13:43 +0000, Julien Grall wrote:
> Hi all,
> 
> Ping?

No one made a case for a release exception so I put it in my 4.5 pile.

>  It would be nice to have this patch for Xen 4.4 as IPI priority
> patch won't be pushed before the release.
> 
> The patch is a minor change and won't impact normal use. When dom0 is
> built, Xen always do it on CPU 0.

Right, so whoever is doing otherwise already has a big pile of patches I
presume?

It's rather late to be making such changes IMHO, but I'll defer to
George.

> 
> Regards,
> 
> On 02/04/2014 04:20 PM, Stefano Stabellini wrote:
> > gic_route_irq_to_guest routes all IRQs to
> > cpumask_of(smp_processor_id()), but actually it always called on cpu0.
> > To avoid confusion and possible issues in case someone modified the code
> > and reassigned a particular irq to a cpu other than cpu0, hardcode
> > cpumask_of(0).
> > 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index e6257a7..8854800 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -776,8 +776,8 @@ int gic_route_irq_to_guest(struct domain *d, const struct dt_irq *irq,
> >  
> >      level = dt_irq_is_level_triggered(irq);
> >  
> > -    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
> > -                           0xa0);
> > +    /* TODO: handle routing irqs to cpus != cpu0 */
> > +    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
> >  
> >      retval = __setup_irq(desc, irq->irq, action);
> >      if (retval) {
> > 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-19 13:53             ` Ian Campbell
@ 2014-02-19 14:15               ` George Dunlap
  2014-02-20 14:52                 ` Stefano Stabellini
  0 siblings, 1 reply; 48+ messages in thread
From: George Dunlap @ 2014-02-19 14:15 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Oleksandr Tyshchenko, xen-devel, George Dunlap,
	Stefano Stabellini

On 02/19/2014 01:53 PM, Ian Campbell wrote:
> On Wed, 2014-02-19 at 13:43 +0000, Julien Grall wrote:
>> Hi all,
>>
>> Ping?
> No one made a case for a release exception so I put it in my 4.5 pile.
>
>>   It would be nice to have this patch for Xen 4.4 as IPI priority
>> patch won't be pushed before the release.
>>
>> The patch is a minor change and won't impact normal use. When dom0 is
>> built, Xen always do it on CPU 0.
> Right, so whoever is doing otherwise already has a big pile of patches I
> presume?
>
> It's rather late to be making such changes IMHO, but I'll defer to
> George.

I can't figure out from the description what's the advantage of having 
it in 4.4.

  -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-19 14:15               ` George Dunlap
@ 2014-02-20 14:52                 ` Stefano Stabellini
  2014-02-21 11:12                   ` George Dunlap
  0 siblings, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-02-20 14:52 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Stefano Stabellini, Julien Grall, George Dunlap,
	xen-devel, Oleksandr Tyshchenko

On Wed, 19 Feb 2014, George Dunlap wrote:
> On 02/19/2014 01:53 PM, Ian Campbell wrote:
> > On Wed, 2014-02-19 at 13:43 +0000, Julien Grall wrote:
> > > Hi all,
> > > 
> > > Ping?
> > No one made a case for a release exception so I put it in my 4.5 pile.
> > 
> > >   It would be nice to have this patch for Xen 4.4 as IPI priority
> > > patch won't be pushed before the release.
> > > 
> > > The patch is a minor change and won't impact normal use. When dom0 is
> > > built, Xen always do it on CPU 0.
> > Right, so whoever is doing otherwise already has a big pile of patches I
> > presume?
> > 
> > It's rather late to be making such changes IMHO, but I'll defer to
> > George.
> 
> I can't figure out from the description what's the advantage of having it in
> 4.4.

People that use the default configuration won't see any differences but
people that manually modify Xen to start a second domain and assign a
device to it would.
To give you a concrete example, it fixes a deadlock reported by
Oleksandr Tyshchenko:

http://marc.info/?l=xen-devel&m=139099606402232

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-20 14:52                 ` Stefano Stabellini
@ 2014-02-21 11:12                   ` George Dunlap
  2014-02-21 11:59                     ` Julien Grall
  0 siblings, 1 reply; 48+ messages in thread
From: George Dunlap @ 2014-02-21 11:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, Julien Grall, xen-devel@lists.xen.org,
	Ian Campbell, George Dunlap

On Thu, Feb 20, 2014 at 2:52 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Feb 2014, George Dunlap wrote:
>> On 02/19/2014 01:53 PM, Ian Campbell wrote:
>> > On Wed, 2014-02-19 at 13:43 +0000, Julien Grall wrote:
>> > > Hi all,
>> > >
>> > > Ping?
>> > No one made a case for a release exception so I put it in my 4.5 pile.
>> >
>> > >   It would be nice to have this patch for Xen 4.4 as IPI priority
>> > > patch won't be pushed before the release.
>> > >
>> > > The patch is a minor change and won't impact normal use. When dom0 is
>> > > built, Xen always do it on CPU 0.
>> > Right, so whoever is doing otherwise already has a big pile of patches I
>> > presume?
>> >
>> > It's rather late to be making such changes IMHO, but I'll defer to
>> > George.
>>
>> I can't figure out from the description what's the advantage of having it in
>> 4.4.
>
> People that use the default configuration won't see any differences but
> people that manually modify Xen to start a second domain and assign a
> device to it would.
> To give you a concrete example, it fixes a deadlock reported by
> Oleksandr Tyshchenko:
>
> http://marc.info/?l=xen-devel&m=139099606402232

Right -- I think if I had been cc'd when the patch was submitted, I
would have said yes for sure; but at this point I think we just want
to get 4.4.0 out without any more delays if possible.

 -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-21 11:12                   ` George Dunlap
@ 2014-02-21 11:59                     ` Julien Grall
  2014-02-21 12:07                       ` George Dunlap
  0 siblings, 1 reply; 48+ messages in thread
From: Julien Grall @ 2014-02-21 11:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: Oleksandr Tyshchenko, George Dunlap, xen-devel@lists.xen.org,
	Ian Campbell, Stefano Stabellini

On 02/21/2014 11:12 AM, George Dunlap wrote:
> On Thu, Feb 20, 2014 at 2:52 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Wed, 19 Feb 2014, George Dunlap wrote:
>>> On 02/19/2014 01:53 PM, Ian Campbell wrote:
>>>> On Wed, 2014-02-19 at 13:43 +0000, Julien Grall wrote:
>>>>> Hi all,
>>>>>
>>>>> Ping?
>>>> No one made a case for a release exception so I put it in my 4.5 pile.
>>>>
>>>>>   It would be nice to have this patch for Xen 4.4 as IPI priority
>>>>> patch won't be pushed before the release.
>>>>>
>>>>> The patch is a minor change and won't impact normal use. When dom0 is
>>>>> built, Xen always do it on CPU 0.
>>>> Right, so whoever is doing otherwise already has a big pile of patches I
>>>> presume?
>>>>
>>>> It's rather late to be making such changes IMHO, but I'll defer to
>>>> George.
>>>
>>> I can't figure out from the description what's the advantage of having it in
>>> 4.4.
>>
>> People that use the default configuration won't see any differences but
>> people that manually modify Xen to start a second domain and assign a
>> device to it would.
>> To give you a concrete example, it fixes a deadlock reported by
>> Oleksandr Tyshchenko:
>>
>> http://marc.info/?l=xen-devel&m=139099606402232
> 
> Right -- I think if I had been cc'd when the patch was submitted, I
> would have said yes for sure; but at this point I think we just want
> to get 4.4.0 out without any more delays if possible.

You were already CCed from the beginning :).

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] xen/arm: route irqs to cpu0
  2014-02-21 11:59                     ` Julien Grall
@ 2014-02-21 12:07                       ` George Dunlap
  0 siblings, 0 replies; 48+ messages in thread
From: George Dunlap @ 2014-02-21 12:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr Tyshchenko, George Dunlap, xen-devel@lists.xen.org,
	Ian Campbell, Stefano Stabellini

On 02/21/2014 11:59 AM, Julien Grall wrote:
> On 02/21/2014 11:12 AM, George Dunlap wrote:
>> On Thu, Feb 20, 2014 at 2:52 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>>> On Wed, 19 Feb 2014, George Dunlap wrote:
>>>> On 02/19/2014 01:53 PM, Ian Campbell wrote:
>>>>> On Wed, 2014-02-19 at 13:43 +0000, Julien Grall wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Ping?
>>>>> No one made a case for a release exception so I put it in my 4.5 pile.
>>>>>
>>>>>>    It would be nice to have this patch for Xen 4.4 as IPI priority
>>>>>> patch won't be pushed before the release.
>>>>>>
>>>>>> The patch is a minor change and won't impact normal use. When dom0 is
>>>>>> built, Xen always do it on CPU 0.
>>>>> Right, so whoever is doing otherwise already has a big pile of patches I
>>>>> presume?
>>>>>
>>>>> It's rather late to be making such changes IMHO, but I'll defer to
>>>>> George.
>>>> I can't figure out from the description what's the advantage of having it in
>>>> 4.4.
>>> People that use the default configuration won't see any differences but
>>> people that manually modify Xen to start a second domain and assign a
>>> device to it would.
>>> To give you a concrete example, it fixes a deadlock reported by
>>> Oleksandr Tyshchenko:
>>>
>>> http://marc.info/?l=xen-devel&m=139099606402232
>> Right -- I think if I had been cc'd when the patch was submitted, I
>> would have said yes for sure; but at this point I think we just want
>> to get 4.4.0 out without any more delays if possible.
> You were already CCed from the beginning :).

Right. :-)  But as Ian said, no one made a case for a release exception, 
and I got a bit tired of having to ask every time. :-)

  -George

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 10:56     ` Oleksandr Tyshchenko
  2014-01-29 11:42       ` Stefano Stabellini
@ 2014-01-29 13:07       ` Julien Grall
  2014-01-29 13:22         ` Stefano Stabellini
  1 sibling, 1 reply; 48+ messages in thread
From: Julien Grall @ 2014-01-29 13:07 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: Stefano Stabellini, Ian Campbell, xen-devel



On 29/01/14 10:56, Oleksandr Tyshchenko wrote:
> Hello all,
>
> I just recollected about one hack which we created
> as we needed to route HW IRQ in domU.
>
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 9d793ba..d0227b9 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -989,8 +989,6 @@ static void domcreate_launch_dm(libxl__egc *egc,
> libxl__multidev *multidev,
>
>           LOG(DEBUG, "dom%d irq %d", domid, irq);
>
> -        ret = irq >= 0 ? xc_physdev_map_pirq(CTX->xch, domid, irq, &irq)
> -                       : -EOVERFLOW;
>           if (!ret)
>               ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
>           if (ret < 0) {
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 2e4b11f..b54c08e 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -85,7 +85,7 @@ int domain_vgic_init(struct domain *d)
>       if ( d->domain_id == 0 )
>           d->arch.vgic.nr_lines = gic_number_lines() - 32;
>       else
> -        d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest */
> +        d->arch.vgic.nr_lines = gic_number_lines() - 32; /* We do
> need SPIs for the guest */
>
>       d->arch.vgic.shared_irqs =
>           xzalloc_array(struct vgic_irq_rank, DOMAIN_NR_RANKS(d));
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 75e2df3..ba88901 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -29,6 +29,7 @@
>   #include <asm/page.h>
>   #include <public/domctl.h>
>   #include <xsm/xsm.h>
> +#include <asm/gic.h>
>
>   static DEFINE_SPINLOCK(domctl_lock);
>   DEFINE_SPINLOCK(vcpu_alloc_lock);
> @@ -782,8 +783,11 @@ long
> do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>               ret = -EINVAL;
>           else if ( xsm_irq_permission(XSM_HOOK, d, pirq, allow) )
>               ret = -EPERM;
> -        else if ( allow )
> -            ret = pirq_permit_access(d, pirq);
> +        else if ( allow ) {
> +            struct dt_irq irq = {pirq + NR_LOCAL_IRQS,0};
> +            ret = pirq_permit_access(d, irq.irq);
> +            gic_route_irq_to_guest(d, &irq, "");
> +        }
>           else
>               ret = pirq_deny_access(d, pirq);
>       }
> (END)
>
> It seems, the following patch can violate the logic about routing
> physical IRQs only to CPU0.

I forgot the smp_processor_id() in gic_route_irq_to_guest(). As this 
function is only called (for upstream) when dom0 is building, only CPU0 
is used.

> In gic_route_irq_to_guest() we need to call gic_set_irq_properties()
> where the one of the parameters is cpumask_of(smp_processor_id()).
> But in this part of code this function can be executed on CPU1. And as
> result this can cause to the fact that the wrong value would set to
> target CPU mask.
> Please, confirm my assumption.
> If I am right we have to add a basic HW IRQ routing to DomU in a right way.

With this current implementation, IRQ will be routed to CPU which has 
done the hypercall. This CPU could be different than the CPU where the 
domU (assuming we have only 1 VCPU) is running.
I think for both dom0 and domU, in case of the VCPU is pinned, we should 
say use the cpumask of this VCPU.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 13:07       ` [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Julien Grall
@ 2014-01-29 13:22         ` Stefano Stabellini
  2014-01-29 18:40           ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-29 13:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Ian Campbell, xen-devel

On Wed, 29 Jan 2014, Julien Grall wrote:
> On 29/01/14 10:56, Oleksandr Tyshchenko wrote:
> > Hello all,
> > 
> > I just recollected about one hack which we created
> > as we needed to route HW IRQ in domU.
> > 
> > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > index 9d793ba..d0227b9 100644
> > --- a/tools/libxl/libxl_create.c
> > +++ b/tools/libxl/libxl_create.c
> > @@ -989,8 +989,6 @@ static void domcreate_launch_dm(libxl__egc *egc,
> > libxl__multidev *multidev,
> > 
> >           LOG(DEBUG, "dom%d irq %d", domid, irq);
> > 
> > -        ret = irq >= 0 ? xc_physdev_map_pirq(CTX->xch, domid, irq, &irq)
> > -                       : -EOVERFLOW;
> >           if (!ret)
> >               ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
> >           if (ret < 0) {
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 2e4b11f..b54c08e 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -85,7 +85,7 @@ int domain_vgic_init(struct domain *d)
> >       if ( d->domain_id == 0 )
> >           d->arch.vgic.nr_lines = gic_number_lines() - 32;
> >       else
> > -        d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest */
> > +        d->arch.vgic.nr_lines = gic_number_lines() - 32; /* We do
> > need SPIs for the guest */
> > 
> >       d->arch.vgic.shared_irqs =
> >           xzalloc_array(struct vgic_irq_rank, DOMAIN_NR_RANKS(d));
> > diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> > index 75e2df3..ba88901 100644
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -29,6 +29,7 @@
> >   #include <asm/page.h>
> >   #include <public/domctl.h>
> >   #include <xsm/xsm.h>
> > +#include <asm/gic.h>
> > 
> >   static DEFINE_SPINLOCK(domctl_lock);
> >   DEFINE_SPINLOCK(vcpu_alloc_lock);
> > @@ -782,8 +783,11 @@ long
> > do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >               ret = -EINVAL;
> >           else if ( xsm_irq_permission(XSM_HOOK, d, pirq, allow) )
> >               ret = -EPERM;
> > -        else if ( allow )
> > -            ret = pirq_permit_access(d, pirq);
> > +        else if ( allow ) {
> > +            struct dt_irq irq = {pirq + NR_LOCAL_IRQS,0};
> > +            ret = pirq_permit_access(d, irq.irq);
> > +            gic_route_irq_to_guest(d, &irq, "");
> > +        }
> >           else
> >               ret = pirq_deny_access(d, pirq);
> >       }
> > (END)
> > 
> > It seems, the following patch can violate the logic about routing
> > physical IRQs only to CPU0.
> 
> I forgot the smp_processor_id() in gic_route_irq_to_guest(). As this function
> is only called (for upstream) when dom0 is building, only CPU0 is used.

Right, that's why changing it to cpumask_of(0) shouldn't make any
difference for xen-unstable (it should make things clearer, if nothing
else) but it should fix things for Oleksandr.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 13:22         ` Stefano Stabellini
@ 2014-01-29 18:40           ` Oleksandr Tyshchenko
  2014-01-29 18:43             ` Oleksandr Tyshchenko
  2014-01-29 18:49             ` Julien Grall
  0 siblings, 2 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-29 18:40 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

> Right, that's why changing it to cpumask_of(0) shouldn't make any
> difference for xen-unstable (it should make things clearer, if nothing
> else) but it should fix things for Oleksandr.

Unfortunately, it is not enough for stable work.

I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in
gic_route_irq_to_guest(). And as result, I don't see our situation
which cause to deadlock in on_selected_cpus function (expected).
But, hypervisor sometimes hangs somewhere else (I have not identified
yet where this is happening) or I sometimes see traps, like that:
("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)

(XEN) CPU1: Unexpected Trap: Undefined Instruction
(XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) PC:     00242c1c __warn+0x20/0x28
(XEN) CPSR:   200001da MODE:Hypervisor
(XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
(XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
(XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
(XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
(XEN)
(XEN)   VTCR_EL2: 80002558
(XEN)  VTTBR_EL2: 00020000dec6a000
(XEN)
(XEN)  SCTLR_EL2: 30cd187f
(XEN)    HCR_EL2: 00000000000028b5
(XEN)  TTBR0_EL2: 00000000d2014000
(XEN)
(XEN)    ESR_EL2: 00000000
(XEN)  HPFAR_EL2: 0000000000482110
(XEN)      HDFAR: fa211190
(XEN)      HIFAR: 00000000
(XEN)
(XEN) Xen stack trace from sp=4bfd7eb4:
(XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097 00000001
(XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019 00000000
(XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58 00000000
(XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097 00000097
(XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8 4bfd7f58
(XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097 00000000
(XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff b6efbca3
(XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8 c007680c
(XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193 00000000
(XEN)    ffeffbfe fedeefff fffd5ffe
(XEN) Xen call trace:
(XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
(XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
(XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
(XEN)    [<00248e60>] do_IRQ+0x138/0x198
(XEN)    [<00248978>] gic_interrupt+0x58/0xc0
(XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
(XEN)    [<00251830>] return_from_trap+0/0x4
(XEN)

Also I am posting maintenance_interrupt() from my tree:

static void maintenance_interrupt(int irq, void *dev_id, struct
cpu_user_regs *regs)
{
    int i = 0, virq, pirq;
    uint32_t lr;
    struct vcpu *v = current;
    uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);

    while ((i = find_next_bit((const long unsigned int *) &eisr,
                              64, i)) < 64) {
        struct pending_irq *p, *n;
        int cpu, eoi;

        cpu = -1;
        eoi = 0;

        spin_lock_irq(&gic.lock);
        lr = GICH[GICH_LR + i];
        virq = lr & GICH_LR_VIRTUAL_MASK;

        p = irq_to_pending(v, virq);
        if ( p->desc != NULL ) {
            p->desc->status &= ~IRQ_INPROGRESS;
            /* Assume only one pcpu needs to EOI the irq */
            cpu = p->desc->arch.eoi_cpu;
            eoi = 1;
            pirq = p->desc->irq;
        }
        if ( !atomic_dec_and_test(&p->inflight_cnt) )
        {
            /* Physical IRQ can't be reinject */
            WARN_ON(p->desc != NULL);
            gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
            spin_unlock_irq(&gic.lock);
            i++;
            continue;
        }

        GICH[GICH_LR + i] = 0;
        clear_bit(i, &this_cpu(lr_mask));

        if ( !list_empty(&v->arch.vgic.lr_pending) ) {
            n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n), lr_queue);
            gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
            list_del_init(&n->lr_queue);
            set_bit(i, &this_cpu(lr_mask));
        } else {
            gic_inject_irq_stop();
        }
        spin_unlock_irq(&gic.lock);

        spin_lock_irq(&v->arch.vgic.lock);
        list_del_init(&p->inflight);
        spin_unlock_irq(&v->arch.vgic.lock);

        if ( eoi ) {
            /* this is not racy because we can't receive another irq of the
             * same type until we EOI it.  */
            if ( cpu == smp_processor_id() )
                gic_irq_eoi((void*)(uintptr_t)pirq);
            else
                on_selected_cpus(cpumask_of(cpu),
                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
        }

        i++;
    }
}


Oleksandr Tyshchenko | Embedded Developer
GlobalLogic

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 18:40           ` Oleksandr Tyshchenko
@ 2014-01-29 18:43             ` Oleksandr Tyshchenko
  2014-01-29 18:49             ` Julien Grall
  1 sibling, 0 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-29 18:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Sorry, just a small correction:

> I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in

cpumask_of(0) instead of cpumask_of(smp_processor_id())

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 18:40           ` Oleksandr Tyshchenko
  2014-01-29 18:43             ` Oleksandr Tyshchenko
@ 2014-01-29 18:49             ` Julien Grall
  2014-01-29 19:54               ` Oleksandr Tyshchenko
  2014-01-30 13:24               ` Stefano Stabellini
  1 sibling, 2 replies; 48+ messages in thread
From: Julien Grall @ 2014-01-29 18:49 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel, Ian Campbell, Stefano Stabellini


[-- Attachment #1.1: Type: text/plain, Size: 5532 bytes --]

Hi,

It's weird, physical IRQ should not be injected twice ...
Were you able to print the IRQ number?

In any case, you are using the old version of the interrupt patch series.
Your new error may come of race condition in this code.

Can you try to use a newest version?
 On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" <
oleksandr.tyshchenko@globallogic.com> wrote:

> > Right, that's why changing it to cpumask_of(0) shouldn't make any
> > difference for xen-unstable (it should make things clearer, if nothing
> > else) but it should fix things for Oleksandr.
>
> Unfortunately, it is not enough for stable work.
>
> I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0)
> in
> gic_route_irq_to_guest(). And as result, I don't see our situation
> which cause to deadlock in on_selected_cpus function (expected).
> But, hypervisor sometimes hangs somewhere else (I have not identified
> yet where this is happening) or I sometimes see traps, like that:
> ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
>
> (XEN) CPU1: Unexpected Trap: Undefined Instruction
> (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
> (XEN) CPU:    1
> (XEN) PC:     00242c1c __warn+0x20/0x28
> (XEN) CPSR:   200001da MODE:Hypervisor
> (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
> (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
> (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
> (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
> (XEN)
> (XEN)   VTCR_EL2: 80002558
> (XEN)  VTTBR_EL2: 00020000dec6a000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd187f
> (XEN)    HCR_EL2: 00000000000028b5
> (XEN)  TTBR0_EL2: 00000000d2014000
> (XEN)
> (XEN)    ESR_EL2: 00000000
> (XEN)  HPFAR_EL2: 0000000000482110
> (XEN)      HDFAR: fa211190
> (XEN)      HIFAR: 00000000
> (XEN)
> (XEN) Xen stack trace from sp=4bfd7eb4:
> (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097
> 00000001
> (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019
> 00000000
> (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58
> 00000000
> (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097
> 00000097
> (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8
> 4bfd7f58
> (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097
> 00000000
> (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff
> b6efbca3
> (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8
> c007680c
> (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000
> 00000000
> (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193
> 00000000
> (XEN)    ffeffbfe fedeefff fffd5ffe
> (XEN) Xen call trace:
> (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
> (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
> (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
> (XEN)    [<00248e60>] do_IRQ+0x138/0x198
> (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
> (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
> (XEN)    [<00251830>] return_from_trap+0/0x4
> (XEN)
>
> Also I am posting maintenance_interrupt() from my tree:
>
> static void maintenance_interrupt(int irq, void *dev_id, struct
> cpu_user_regs *regs)
> {
>     int i = 0, virq, pirq;
>     uint32_t lr;
>     struct vcpu *v = current;
>     uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) <<
> 32);
>
>     while ((i = find_next_bit((const long unsigned int *) &eisr,
>                               64, i)) < 64) {
>         struct pending_irq *p, *n;
>         int cpu, eoi;
>
>         cpu = -1;
>         eoi = 0;
>
>         spin_lock_irq(&gic.lock);
>         lr = GICH[GICH_LR + i];
>         virq = lr & GICH_LR_VIRTUAL_MASK;
>
>         p = irq_to_pending(v, virq);
>         if ( p->desc != NULL ) {
>             p->desc->status &= ~IRQ_INPROGRESS;
>             /* Assume only one pcpu needs to EOI the irq */
>             cpu = p->desc->arch.eoi_cpu;
>             eoi = 1;
>             pirq = p->desc->irq;
>         }
>         if ( !atomic_dec_and_test(&p->inflight_cnt) )
>         {
>             /* Physical IRQ can't be reinject */
>             WARN_ON(p->desc != NULL);
>             gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
>             spin_unlock_irq(&gic.lock);
>             i++;
>             continue;
>         }
>
>         GICH[GICH_LR + i] = 0;
>         clear_bit(i, &this_cpu(lr_mask));
>
>         if ( !list_empty(&v->arch.vgic.lr_pending) ) {
>             n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n),
> lr_queue);
>             gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
>             list_del_init(&n->lr_queue);
>             set_bit(i, &this_cpu(lr_mask));
>         } else {
>             gic_inject_irq_stop();
>         }
>         spin_unlock_irq(&gic.lock);
>
>         spin_lock_irq(&v->arch.vgic.lock);
>         list_del_init(&p->inflight);
>         spin_unlock_irq(&v->arch.vgic.lock);
>
>         if ( eoi ) {
>             /* this is not racy because we can't receive another irq of the
>              * same type until we EOI it.  */
>             if ( cpu == smp_processor_id() )
>                 gic_irq_eoi((void*)(uintptr_t)pirq);
>             else
>                 on_selected_cpus(cpumask_of(cpu),
>                                  gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
>         }
>
>         i++;
>     }
> }
>
>
> Oleksandr Tyshchenko | Embedded Developer
> GlobalLogic
>

[-- Attachment #1.2: Type: text/html, Size: 6801 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 18:49             ` Julien Grall
@ 2014-01-29 19:54               ` Oleksandr Tyshchenko
  2014-01-30  0:42                 ` Julien Grall
  2014-01-30 13:24               ` Stefano Stabellini
  1 sibling, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-29 19:54 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

On Wed, Jan 29, 2014 at 8:49 PM, Julien Grall <julien.grall@linaro.org> wrote:
> Hi,
>
> It's weird, physical IRQ should not be injected twice ...
> Were you able to print the IRQ number?

p->irq: 151, p->desc->irq: 151 (it is touchscreen irq)

>
> In any case, you are using the old version of the interrupt patch series.
> Your new error may come of race condition in this code.
>
> Can you try to use a newest version?

Yes, I can. But not today or tomorrow, I need time to apply our local
changes to newest version needed our system to work.
And Do you mean try to use newest version of XEN at whole or only
parts of code related to interrupts handling?

>
> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko"
> <oleksandr.tyshchenko@globallogic.com> wrote:
>>
>> > Right, that's why changing it to cpumask_of(0) shouldn't make any
>> > difference for xen-unstable (it should make things clearer, if nothing
>> > else) but it should fix things for Oleksandr.
>>
>> Unfortunately, it is not enough for stable work.
>>
>> I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0)
>> in
>> gic_route_irq_to_guest(). And as result, I don't see our situation
>> which cause to deadlock in on_selected_cpus function (expected).
>> But, hypervisor sometimes hangs somewhere else (I have not identified
>> yet where this is happening) or I sometimes see traps, like that:
>> ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
>>
>> (XEN) CPU1: Unexpected Trap: Undefined Instruction
>> (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
>> (XEN) CPU:    1
>> (XEN) PC:     00242c1c __warn+0x20/0x28
>> (XEN) CPSR:   200001da MODE:Hypervisor
>> (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
>> (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
>> (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc
>> R12:00000002
>> (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
>> (XEN)
>> (XEN)   VTCR_EL2: 80002558
>> (XEN)  VTTBR_EL2: 00020000dec6a000
>> (XEN)
>> (XEN)  SCTLR_EL2: 30cd187f
>> (XEN)    HCR_EL2: 00000000000028b5
>> (XEN)  TTBR0_EL2: 00000000d2014000
>> (XEN)
>> (XEN)    ESR_EL2: 00000000
>> (XEN)  HPFAR_EL2: 0000000000482110
>> (XEN)      HDFAR: fa211190
>> (XEN)      HIFAR: 00000000
>> (XEN)
>> (XEN) Xen stack trace from sp=4bfd7eb4:
>> (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097
>> 00000001
>> (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019
>> 00000000
>> (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58
>> 00000000
>> (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097
>> 00000097
>> (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8
>> 4bfd7f58
>> (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097
>> 00000000
>> (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff
>> b6efbca3
>> (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8
>> c007680c
>> (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000
>> 00000000
>> (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193
>> 00000000
>> (XEN)    ffeffbfe fedeefff fffd5ffe
>> (XEN) Xen call trace:
>> (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
>> (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
>> (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
>> (XEN)    [<00248e60>] do_IRQ+0x138/0x198
>> (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
>> (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
>> (XEN)    [<00251830>] return_from_trap+0/0x4
>> (XEN)
>>
>> Also I am posting maintenance_interrupt() from my tree:
>>
>> static void maintenance_interrupt(int irq, void *dev_id, struct
>> cpu_user_regs *regs)
>> {
>>     int i = 0, virq, pirq;
>>     uint32_t lr;
>>     struct vcpu *v = current;
>>     uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) <<
>> 32);
>>
>>     while ((i = find_next_bit((const long unsigned int *) &eisr,
>>                               64, i)) < 64) {
>>         struct pending_irq *p, *n;
>>         int cpu, eoi;
>>
>>         cpu = -1;
>>         eoi = 0;
>>
>>         spin_lock_irq(&gic.lock);
>>         lr = GICH[GICH_LR + i];
>>         virq = lr & GICH_LR_VIRTUAL_MASK;
>>
>>         p = irq_to_pending(v, virq);
>>         if ( p->desc != NULL ) {
>>             p->desc->status &= ~IRQ_INPROGRESS;
>>             /* Assume only one pcpu needs to EOI the irq */
>>             cpu = p->desc->arch.eoi_cpu;
>>             eoi = 1;
>>             pirq = p->desc->irq;
>>         }
>>         if ( !atomic_dec_and_test(&p->inflight_cnt) )
>>         {
>>             /* Physical IRQ can't be reinject */
>>             WARN_ON(p->desc != NULL);
>>             gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
>>             spin_unlock_irq(&gic.lock);
>>             i++;
>>             continue;
>>         }
>>
>>         GICH[GICH_LR + i] = 0;
>>         clear_bit(i, &this_cpu(lr_mask));
>>
>>         if ( !list_empty(&v->arch.vgic.lr_pending) ) {
>>             n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n),
>> lr_queue);
>>             gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
>>             list_del_init(&n->lr_queue);
>>             set_bit(i, &this_cpu(lr_mask));
>>         } else {
>>             gic_inject_irq_stop();
>>         }
>>         spin_unlock_irq(&gic.lock);
>>
>>         spin_lock_irq(&v->arch.vgic.lock);
>>         list_del_init(&p->inflight);
>>         spin_unlock_irq(&v->arch.vgic.lock);
>>
>>         if ( eoi ) {
>>             /* this is not racy because we can't receive another irq of
>> the
>>              * same type until we EOI it.  */
>>             if ( cpu == smp_processor_id() )
>>                 gic_irq_eoi((void*)(uintptr_t)pirq);
>>             else
>>                 on_selected_cpus(cpumask_of(cpu),
>>                                  gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
>>         }
>>
>>         i++;
>>     }
>> }
>>
>>
>> Oleksandr Tyshchenko | Embedded Developer
>> GlobalLogic

-- 

Oleksandr Tyshchenko | Embedded Developer
GlobalLogic

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 19:54               ` Oleksandr Tyshchenko
@ 2014-01-30  0:42                 ` Julien Grall
  0 siblings, 0 replies; 48+ messages in thread
From: Julien Grall @ 2014-01-30  0:42 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel, Ian Campbell, Stefano Stabellini



On 29/01/14 19:54, Oleksandr Tyshchenko wrote:
> On Wed, Jan 29, 2014 at 8:49 PM, Julien Grall <julien.grall@linaro.org> wrote:
>> Hi,
>>
>> It's weird, physical IRQ should not be injected twice ...
>> Were you able to print the IRQ number?
>
> p->irq: 151, p->desc->irq: 151 (it is touchscreen irq)
>
>>
>> In any case, you are using the old version of the interrupt patch series.
>> Your new error may come of race condition in this code.
>>
>> Can you try to use a newest version?
>
> Yes, I can. But not today or tomorrow, I need time to apply our local
> changes to newest version needed our system to work.
> And Do you mean try to use newest version of XEN at whole or only
> parts of code related to interrupts handling?
>

I don't think you need a newest version. Which Xen commit are you using?

This range of commit should be enough: 82064f0 - 88eb95e.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 18:49             ` Julien Grall
  2014-01-29 19:54               ` Oleksandr Tyshchenko
@ 2014-01-30 13:24               ` Stefano Stabellini
  2014-01-30 15:06                 ` Oleksandr Tyshchenko
  1 sibling, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-30 13:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr Tyshchenko, xen-devel, Ian Campbell, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 6807 bytes --]

Is it a level or an edge irq?

On Wed, 29 Jan 2014, Julien Grall wrote:
> Hi,
> 
> It's weird, physical IRQ should not be injected twice ...
> Were you able to print the IRQ number?
> 
> In any case, you are using the old version of the interrupt patch series.
> Your new error may come of race condition in this code.
> 
> Can you try to use a newest version?
> 
> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" <oleksandr.tyshchenko@globallogic.com> wrote:
>       > Right, that's why changing it to cpumask_of(0) shouldn't make any
>       > difference for xen-unstable (it should make things clearer, if nothing
>       > else) but it should fix things for Oleksandr.
> 
>       Unfortunately, it is not enough for stable work.
> 
>       I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in
>       gic_route_irq_to_guest(). And as result, I don't see our situation
>       which cause to deadlock in on_selected_cpus function (expected).
>       But, hypervisor sometimes hangs somewhere else (I have not identified
>       yet where this is happening) or I sometimes see traps, like that:
>       ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
> 
>       (XEN) CPU1: Unexpected Trap: Undefined Instruction
>       (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
>       (XEN) CPU:    1
>       (XEN) PC:     00242c1c __warn+0x20/0x28
>       (XEN) CPSR:   200001da MODE:Hypervisor
>       (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
>       (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
>       (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
>       (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
>       (XEN)
>       (XEN)   VTCR_EL2: 80002558
>       (XEN)  VTTBR_EL2: 00020000dec6a000
>       (XEN)
>       (XEN)  SCTLR_EL2: 30cd187f
>       (XEN)    HCR_EL2: 00000000000028b5
>       (XEN)  TTBR0_EL2: 00000000d2014000
>       (XEN)
>       (XEN)    ESR_EL2: 00000000
>       (XEN)  HPFAR_EL2: 0000000000482110
>       (XEN)      HDFAR: fa211190
>       (XEN)      HIFAR: 00000000
>       (XEN)
>       (XEN) Xen stack trace from sp=4bfd7eb4:
>       (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097 00000001
>       (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019 00000000
>       (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58 00000000
>       (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097 00000097
>       (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8 4bfd7f58
>       (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097 00000000
>       (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff b6efbca3
>       (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8 c007680c
>       (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000 00000000
>       (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193 00000000
>       (XEN)    ffeffbfe fedeefff fffd5ffe
>       (XEN) Xen call trace:
>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
>       (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
>       (XEN)    [<00248e60>] do_IRQ+0x138/0x198
>       (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
>       (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
>       (XEN)    [<00251830>] return_from_trap+0/0x4
>       (XEN)
> 
>       Also I am posting maintenance_interrupt() from my tree:
> 
>       static void maintenance_interrupt(int irq, void *dev_id, struct
>       cpu_user_regs *regs)
>       {
>           int i = 0, virq, pirq;
>           uint32_t lr;
>           struct vcpu *v = current;
>           uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);
> 
>           while ((i = find_next_bit((const long unsigned int *) &eisr,
>                                     64, i)) < 64) {
>               struct pending_irq *p, *n;
>               int cpu, eoi;
> 
>               cpu = -1;
>               eoi = 0;
> 
>               spin_lock_irq(&gic.lock);
>               lr = GICH[GICH_LR + i];
>               virq = lr & GICH_LR_VIRTUAL_MASK;
> 
>               p = irq_to_pending(v, virq);
>               if ( p->desc != NULL ) {
>                   p->desc->status &= ~IRQ_INPROGRESS;
>                   /* Assume only one pcpu needs to EOI the irq */
>                   cpu = p->desc->arch.eoi_cpu;
>                   eoi = 1;
>                   pirq = p->desc->irq;
>               }
>               if ( !atomic_dec_and_test(&p->inflight_cnt) )
>               {
>                   /* Physical IRQ can't be reinject */
>                   WARN_ON(p->desc != NULL);
>                   gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
>                   spin_unlock_irq(&gic.lock);
>                   i++;
>                   continue;
>               }
> 
>               GICH[GICH_LR + i] = 0;
>               clear_bit(i, &this_cpu(lr_mask));
> 
>               if ( !list_empty(&v->arch.vgic.lr_pending) ) {
>                   n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n), lr_queue);
>                   gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
>                   list_del_init(&n->lr_queue);
>                   set_bit(i, &this_cpu(lr_mask));
>               } else {
>                   gic_inject_irq_stop();
>               }
>               spin_unlock_irq(&gic.lock);
> 
>               spin_lock_irq(&v->arch.vgic.lock);
>               list_del_init(&p->inflight);
>               spin_unlock_irq(&v->arch.vgic.lock);
> 
>               if ( eoi ) {
>                   /* this is not racy because we can't receive another irq of the
>                    * same type until we EOI it.  */
>                   if ( cpu == smp_processor_id() )
>                       gic_irq_eoi((void*)(uintptr_t)pirq);
>                   else
>                       on_selected_cpus(cpumask_of(cpu),
>                                        gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
>               }
> 
>               i++;
>           }
>       }
> 
> 
>       Oleksandr Tyshchenko | Embedded Developer
>       GlobalLogic
> 
> 
> 

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-30 13:24               ` Stefano Stabellini
@ 2014-01-30 15:06                 ` Oleksandr Tyshchenko
  2014-01-30 15:35                   ` Stefano Stabellini
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-30 15:06 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

According to DT it is a level irq (DT_IRQ_TYPE_LEVEL_HIGH)

On Thu, Jan 30, 2014 at 3:24 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> Is it a level or an edge irq?
>
> On Wed, 29 Jan 2014, Julien Grall wrote:
>> Hi,
>>
>> It's weird, physical IRQ should not be injected twice ...
>> Were you able to print the IRQ number?
>>
>> In any case, you are using the old version of the interrupt patch series.
>> Your new error may come of race condition in this code.
>>
>> Can you try to use a newest version?
>>
>> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" <oleksandr.tyshchenko@globallogic.com> wrote:
>>       > Right, that's why changing it to cpumask_of(0) shouldn't make any
>>       > difference for xen-unstable (it should make things clearer, if nothing
>>       > else) but it should fix things for Oleksandr.
>>
>>       Unfortunately, it is not enough for stable work.
>>
>>       I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in
>>       gic_route_irq_to_guest(). And as result, I don't see our situation
>>       which cause to deadlock in on_selected_cpus function (expected).
>>       But, hypervisor sometimes hangs somewhere else (I have not identified
>>       yet where this is happening) or I sometimes see traps, like that:
>>       ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
>>
>>       (XEN) CPU1: Unexpected Trap: Undefined Instruction
>>       (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
>>       (XEN) CPU:    1
>>       (XEN) PC:     00242c1c __warn+0x20/0x28
>>       (XEN) CPSR:   200001da MODE:Hypervisor
>>       (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
>>       (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
>>       (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
>>       (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
>>       (XEN)
>>       (XEN)   VTCR_EL2: 80002558
>>       (XEN)  VTTBR_EL2: 00020000dec6a000
>>       (XEN)
>>       (XEN)  SCTLR_EL2: 30cd187f
>>       (XEN)    HCR_EL2: 00000000000028b5
>>       (XEN)  TTBR0_EL2: 00000000d2014000
>>       (XEN)
>>       (XEN)    ESR_EL2: 00000000
>>       (XEN)  HPFAR_EL2: 0000000000482110
>>       (XEN)      HDFAR: fa211190
>>       (XEN)      HIFAR: 00000000
>>       (XEN)
>>       (XEN) Xen stack trace from sp=4bfd7eb4:
>>       (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097 00000001
>>       (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019 00000000
>>       (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58 00000000
>>       (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097 00000097
>>       (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8 4bfd7f58
>>       (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097 00000000
>>       (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff b6efbca3
>>       (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8 c007680c
>>       (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000 00000000
>>       (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193 00000000
>>       (XEN)    ffeffbfe fedeefff fffd5ffe
>>       (XEN) Xen call trace:
>>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
>>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
>>       (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
>>       (XEN)    [<00248e60>] do_IRQ+0x138/0x198
>>       (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
>>       (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
>>       (XEN)    [<00251830>] return_from_trap+0/0x4
>>       (XEN)
>>
>>       Also I am posting maintenance_interrupt() from my tree:
>>
>>       static void maintenance_interrupt(int irq, void *dev_id, struct
>>       cpu_user_regs *regs)
>>       {
>>           int i = 0, virq, pirq;
>>           uint32_t lr;
>>           struct vcpu *v = current;
>>           uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);
>>
>>           while ((i = find_next_bit((const long unsigned int *) &eisr,
>>                                     64, i)) < 64) {
>>               struct pending_irq *p, *n;
>>               int cpu, eoi;
>>
>>               cpu = -1;
>>               eoi = 0;
>>
>>               spin_lock_irq(&gic.lock);
>>               lr = GICH[GICH_LR + i];
>>               virq = lr & GICH_LR_VIRTUAL_MASK;
>>
>>               p = irq_to_pending(v, virq);
>>               if ( p->desc != NULL ) {
>>                   p->desc->status &= ~IRQ_INPROGRESS;
>>                   /* Assume only one pcpu needs to EOI the irq */
>>                   cpu = p->desc->arch.eoi_cpu;
>>                   eoi = 1;
>>                   pirq = p->desc->irq;
>>               }
>>               if ( !atomic_dec_and_test(&p->inflight_cnt) )
>>               {
>>                   /* Physical IRQ can't be reinject */
>>                   WARN_ON(p->desc != NULL);
>>                   gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
>>                   spin_unlock_irq(&gic.lock);
>>                   i++;
>>                   continue;
>>               }
>>
>>               GICH[GICH_LR + i] = 0;
>>               clear_bit(i, &this_cpu(lr_mask));
>>
>>               if ( !list_empty(&v->arch.vgic.lr_pending) ) {
>>                   n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n), lr_queue);
>>                   gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
>>                   list_del_init(&n->lr_queue);
>>                   set_bit(i, &this_cpu(lr_mask));
>>               } else {
>>                   gic_inject_irq_stop();
>>               }
>>               spin_unlock_irq(&gic.lock);
>>
>>               spin_lock_irq(&v->arch.vgic.lock);
>>               list_del_init(&p->inflight);
>>               spin_unlock_irq(&v->arch.vgic.lock);
>>
>>               if ( eoi ) {
>>                   /* this is not racy because we can't receive another irq of the
>>                    * same type until we EOI it.  */
>>                   if ( cpu == smp_processor_id() )
>>                       gic_irq_eoi((void*)(uintptr_t)pirq);
>>                   else
>>                       on_selected_cpus(cpumask_of(cpu),
>>                                        gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
>>               }
>>
>>               i++;
>>           }
>>       }
>>
>>
>>       Oleksandr Tyshchenko | Embedded Developer
>>       GlobalLogic
>>
>>
>>



-- 

Name | Title
GlobalLogic
P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-30 15:06                 ` Oleksandr Tyshchenko
@ 2014-01-30 15:35                   ` Stefano Stabellini
  2014-01-30 16:10                     ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-30 15:35 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

Given that we don't deactivate the interrupt (writing to GICC_DIR) until
the guest EOIs it, I can't understand how you manage to get a second
interrupt notifications before the guest EOIs the first one.

Do you set GICC_CTL_EOI in GICC_CTLR?

On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote:
> According to DT it is a level irq (DT_IRQ_TYPE_LEVEL_HIGH)
> 
> On Thu, Jan 30, 2014 at 3:24 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > Is it a level or an edge irq?
> >
> > On Wed, 29 Jan 2014, Julien Grall wrote:
> >> Hi,
> >>
> >> It's weird, physical IRQ should not be injected twice ...
> >> Were you able to print the IRQ number?
> >>
> >> In any case, you are using the old version of the interrupt patch series.
> >> Your new error may come of race condition in this code.
> >>
> >> Can you try to use a newest version?
> >>
> >> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" <oleksandr.tyshchenko@globallogic.com> wrote:
> >>       > Right, that's why changing it to cpumask_of(0) shouldn't make any
> >>       > difference for xen-unstable (it should make things clearer, if nothing
> >>       > else) but it should fix things for Oleksandr.
> >>
> >>       Unfortunately, it is not enough for stable work.
> >>
> >>       I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in
> >>       gic_route_irq_to_guest(). And as result, I don't see our situation
> >>       which cause to deadlock in on_selected_cpus function (expected).
> >>       But, hypervisor sometimes hangs somewhere else (I have not identified
> >>       yet where this is happening) or I sometimes see traps, like that:
> >>       ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
> >>
> >>       (XEN) CPU1: Unexpected Trap: Undefined Instruction
> >>       (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
> >>       (XEN) CPU:    1
> >>       (XEN) PC:     00242c1c __warn+0x20/0x28
> >>       (XEN) CPSR:   200001da MODE:Hypervisor
> >>       (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
> >>       (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
> >>       (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
> >>       (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
> >>       (XEN)
> >>       (XEN)   VTCR_EL2: 80002558
> >>       (XEN)  VTTBR_EL2: 00020000dec6a000
> >>       (XEN)
> >>       (XEN)  SCTLR_EL2: 30cd187f
> >>       (XEN)    HCR_EL2: 00000000000028b5
> >>       (XEN)  TTBR0_EL2: 00000000d2014000
> >>       (XEN)
> >>       (XEN)    ESR_EL2: 00000000
> >>       (XEN)  HPFAR_EL2: 0000000000482110
> >>       (XEN)      HDFAR: fa211190
> >>       (XEN)      HIFAR: 00000000
> >>       (XEN)
> >>       (XEN) Xen stack trace from sp=4bfd7eb4:
> >>       (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097 00000001
> >>       (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019 00000000
> >>       (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58 00000000
> >>       (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097 00000097
> >>       (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8 4bfd7f58
> >>       (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097 00000000
> >>       (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff b6efbca3
> >>       (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8 c007680c
> >>       (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000 00000000
> >>       (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193 00000000
> >>       (XEN)    ffeffbfe fedeefff fffd5ffe
> >>       (XEN) Xen call trace:
> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
> >>       (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
> >>       (XEN)    [<00248e60>] do_IRQ+0x138/0x198
> >>       (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
> >>       (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
> >>       (XEN)    [<00251830>] return_from_trap+0/0x4
> >>       (XEN)
> >>
> >>       Also I am posting maintenance_interrupt() from my tree:
> >>
> >>       static void maintenance_interrupt(int irq, void *dev_id, struct
> >>       cpu_user_regs *regs)
> >>       {
> >>           int i = 0, virq, pirq;
> >>           uint32_t lr;
> >>           struct vcpu *v = current;
> >>           uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);
> >>
> >>           while ((i = find_next_bit((const long unsigned int *) &eisr,
> >>                                     64, i)) < 64) {
> >>               struct pending_irq *p, *n;
> >>               int cpu, eoi;
> >>
> >>               cpu = -1;
> >>               eoi = 0;
> >>
> >>               spin_lock_irq(&gic.lock);
> >>               lr = GICH[GICH_LR + i];
> >>               virq = lr & GICH_LR_VIRTUAL_MASK;
> >>
> >>               p = irq_to_pending(v, virq);
> >>               if ( p->desc != NULL ) {
> >>                   p->desc->status &= ~IRQ_INPROGRESS;
> >>                   /* Assume only one pcpu needs to EOI the irq */
> >>                   cpu = p->desc->arch.eoi_cpu;
> >>                   eoi = 1;
> >>                   pirq = p->desc->irq;
> >>               }
> >>               if ( !atomic_dec_and_test(&p->inflight_cnt) )
> >>               {
> >>                   /* Physical IRQ can't be reinject */
> >>                   WARN_ON(p->desc != NULL);
> >>                   gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
> >>                   spin_unlock_irq(&gic.lock);
> >>                   i++;
> >>                   continue;
> >>               }
> >>
> >>               GICH[GICH_LR + i] = 0;
> >>               clear_bit(i, &this_cpu(lr_mask));
> >>
> >>               if ( !list_empty(&v->arch.vgic.lr_pending) ) {
> >>                   n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n), lr_queue);
> >>                   gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
> >>                   list_del_init(&n->lr_queue);
> >>                   set_bit(i, &this_cpu(lr_mask));
> >>               } else {
> >>                   gic_inject_irq_stop();
> >>               }
> >>               spin_unlock_irq(&gic.lock);
> >>
> >>               spin_lock_irq(&v->arch.vgic.lock);
> >>               list_del_init(&p->inflight);
> >>               spin_unlock_irq(&v->arch.vgic.lock);
> >>
> >>               if ( eoi ) {
> >>                   /* this is not racy because we can't receive another irq of the
> >>                    * same type until we EOI it.  */
> >>                   if ( cpu == smp_processor_id() )
> >>                       gic_irq_eoi((void*)(uintptr_t)pirq);
> >>                   else
> >>                       on_selected_cpus(cpumask_of(cpu),
> >>                                        gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
> >>               }
> >>
> >>               i++;
> >>           }
> >>       }
> >>
> >>
> >>       Oleksandr Tyshchenko | Embedded Developer
> >>       GlobalLogic
> >>
> >>
> >>
> 
> 
> 
> -- 
> 
> Name | Title
> GlobalLogic
> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
> www.globallogic.com
> 
> http://www.globallogic.com/email_disclaimer.txt
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-30 15:35                   ` Stefano Stabellini
@ 2014-01-30 16:10                     ` Oleksandr Tyshchenko
  2014-01-30 17:18                       ` Stefano Stabellini
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-30 16:10 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hello, all.

1. The "simultaneous cross-interrupts" issue only occurs if
gic_route_irq_to_guest() was called on CPU1 during boot.
it is possible in our case, since gic_route_irq_to_guest() is called
when the both domains (dom0 and domU) are building unlike Xen
upstream.
So, the "impropriety" on our side.

Next solution fixes side-effect (deadlock):

while ( unlikely(!spin_trylock(&call_lock)) )
     smp_call_function_interrupt();

1.1 I have checked patch "xen: arm: increase priority of SGIs used as IPIs".
In general it works (I mean that this patch doesn't cause to new
issues). But, it doesn't fix the issue.
(I still see "simultaneous cross-interrupts").

1.2 I also have checked solution where on_selected_cpus call was moved
out of the
interrupt handler. Unfortunately, it doesn't work.

I almost immediately see next error:
(XEN) Assertion 'this_cpu(eoi_irq) == NULL' failed, line 981, file gic.c
(XEN) Xen BUG at gic.c:981
(XEN) CPU1: Unexpected Trap: Undefined Instruction
(XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) PC:     00241ee0 __bug+0x2c/0x44
(XEN) CPSR:   2000015a MODE:Hypervisor
(XEN)      R0: 0026770c R1: 00000000 R2: 3fd2fd00 R3: 00000fff
(XEN)      R4: 00263248 R5: 00264384 R6: 000003d5 R7: 4003d000
(XEN)      R8: 00000001 R9: 00000091 R10:00000000 R11:40037ebc R12:00000001
(XEN) HYP: SP: 40037eb4 LR: 00241ee0
(XEN)
(XEN)   VTCR_EL2: 80002558
(XEN)  VTTBR_EL2: 00010000deffc000
(XEN)
(XEN)  SCTLR_EL2: 30cd187f
(XEN)    HCR_EL2: 0000000000002835
(XEN)  TTBR0_EL2: 00000000d2014000
(XEN)
(XEN)    ESR_EL2: 00000000
(XEN)  HPFAR_EL2: 0000000000482110
(XEN)      HDFAR: fa211f00
(XEN)      HIFAR: 00000000
(XEN)
(XEN) Xen stack trace from sp=40037eb4:
(XEN)    00000000 40037efc 00247e1c 002e6610 002e6610 002e6608 002e6608 00000001
(XEN)    00000000 40015000 40017000 40005f60 40017014 40037f58 00000019 00000000
(XEN)    40005f60 40037f24 00249068 00000009 00000019 00404000 40037f58 00000000
(XEN)    00405000 00004680 002e7694 40037f4c 00248b80 00000000 c5b72000 00000091
(XEN)    00000000 c700d4e0 c008477c 000000f1 00000001 40037f54 0024f6c0 40037f58
(XEN)    00251a30 c700d4e0 00000001 c008477c 00000000 c5b72000 00000091 00000000
(XEN)    c700d4e0 c008477c 000000f1 00000001 00000001 c5b72000 ffffffff 0000a923
(XEN)    c0077ac4 60000193 00000000 b6eadaa0 c0578f40 c00138c0 c5b73f58 c036ab90
(XEN)    c0578f4c c00136a0 c0578f58 c0013920 00000000 00000000 00000000 00000000
(XEN)    00000000 00000000 00000000 80000010 60000193 a0000093 80000193 00000000
(XEN)    00000000 0c41e00c 450c2880
(XEN) Xen call trace:
(XEN)    [<00241ee0>] __bug+0x2c/0x44 (PC)
(XEN)    [<00241ee0>] __bug+0x2c/0x44 (LR)
(XEN)    [<00247e1c>] maintenance_interrupt+0x2e8/0x328
(XEN)    [<00249068>] do_IRQ+0x138/0x198
(XEN)    [<00248b80>] gic_interrupt+0x58/0xc0
(XEN)    [<0024f6c0>] do_trap_irq+0x10/0x14
(XEN)    [<00251a30>] return_from_trap+0/0x4
(XEN)


2. The "simultaneous cross-interrupts" issue doesn't occur if I use
next solution:
So, as result I don't see deadlock in on_selected_cpus()

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e6257a7..af96a31 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -776,8 +795,7 @@ int gic_route_irq_to_guest(struct domain *d, const
struct dt_irq *irq,

     level = dt_irq_is_level_triggered(irq);

-    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
-                           0xa0);
+    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);

     retval = __setup_irq(desc, irq->irq, action);
     if (retval) {

So, as result I don't see deadlock in on_selected_cpus().
But, rarely, I see deadlocks in other parts related to interrupts handling.
As noted by Julien, I am using the old version of the interrupt patch series.
I completely agree.

We are based on next XEN commit:
48249a1 libxl: Avoid realloc(,0) when libxl__xs_directory returns empty list

Also we have some patches, which we cherry-picked when we urgently needed them:
6bba1a3 xen/arm: Keep count of inflight interrupts
33a8aa9 xen/arm: Only enable physical IRQs when the guest asks
b6a4e65 xen/arm: Rename gic_irq_{startup, shutdown} to gic_irq_{mask, unmask}
5dbe455 xen/arm: Don't reinject the IRQ if it's already in LRs
1438f03 xen/arm: Physical IRQ is not always equal to virtual IRQ

I have to apply next patches and check with them:
88eb95e xen/arm: disable a physical IRQ when the guest disables the
corresponding IRQ
a660ee3 xen/arm: implement gic_irq_enable and gic_irq_disable
1dc9556 xen/arm: do not add a second irq to the LRs if one is already present
d16d511 xen/arm: track the state of guest IRQs

I'll report about the results. I hope to do it today.

A lot of thanks to all.

On Thu, Jan 30, 2014 at 5:35 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> Given that we don't deactivate the interrupt (writing to GICC_DIR) until
> the guest EOIs it, I can't understand how you manage to get a second
> interrupt notifications before the guest EOIs the first one.
>
> Do you set GICC_CTL_EOI in GICC_CTLR?
>
> On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote:
>> According to DT it is a level irq (DT_IRQ_TYPE_LEVEL_HIGH)
>>
>> On Thu, Jan 30, 2014 at 3:24 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > Is it a level or an edge irq?
>> >
>> > On Wed, 29 Jan 2014, Julien Grall wrote:
>> >> Hi,
>> >>
>> >> It's weird, physical IRQ should not be injected twice ...
>> >> Were you able to print the IRQ number?
>> >>
>> >> In any case, you are using the old version of the interrupt patch series.
>> >> Your new error may come of race condition in this code.
>> >>
>> >> Can you try to use a newest version?
>> >>
>> >> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" <oleksandr.tyshchenko@globallogic.com> wrote:
>> >>       > Right, that's why changing it to cpumask_of(0) shouldn't make any
>> >>       > difference for xen-unstable (it should make things clearer, if nothing
>> >>       > else) but it should fix things for Oleksandr.
>> >>
>> >>       Unfortunately, it is not enough for stable work.
>> >>
>> >>       I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in
>> >>       gic_route_irq_to_guest(). And as result, I don't see our situation
>> >>       which cause to deadlock in on_selected_cpus function (expected).
>> >>       But, hypervisor sometimes hangs somewhere else (I have not identified
>> >>       yet where this is happening) or I sometimes see traps, like that:
>> >>       ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
>> >>
>> >>       (XEN) CPU1: Unexpected Trap: Undefined Instruction
>> >>       (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
>> >>       (XEN) CPU:    1
>> >>       (XEN) PC:     00242c1c __warn+0x20/0x28
>> >>       (XEN) CPSR:   200001da MODE:Hypervisor
>> >>       (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
>> >>       (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
>> >>       (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
>> >>       (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
>> >>       (XEN)
>> >>       (XEN)   VTCR_EL2: 80002558
>> >>       (XEN)  VTTBR_EL2: 00020000dec6a000
>> >>       (XEN)
>> >>       (XEN)  SCTLR_EL2: 30cd187f
>> >>       (XEN)    HCR_EL2: 00000000000028b5
>> >>       (XEN)  TTBR0_EL2: 00000000d2014000
>> >>       (XEN)
>> >>       (XEN)    ESR_EL2: 00000000
>> >>       (XEN)  HPFAR_EL2: 0000000000482110
>> >>       (XEN)      HDFAR: fa211190
>> >>       (XEN)      HIFAR: 00000000
>> >>       (XEN)
>> >>       (XEN) Xen stack trace from sp=4bfd7eb4:
>> >>       (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097 00000001
>> >>       (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019 00000000
>> >>       (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58 00000000
>> >>       (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097 00000097
>> >>       (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8 4bfd7f58
>> >>       (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097 00000000
>> >>       (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff b6efbca3
>> >>       (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8 c007680c
>> >>       (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000 00000000
>> >>       (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193 00000000
>> >>       (XEN)    ffeffbfe fedeefff fffd5ffe
>> >>       (XEN) Xen call trace:
>> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
>> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
>> >>       (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
>> >>       (XEN)    [<00248e60>] do_IRQ+0x138/0x198
>> >>       (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
>> >>       (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
>> >>       (XEN)    [<00251830>] return_from_trap+0/0x4
>> >>       (XEN)
>> >>
>> >>       Also I am posting maintenance_interrupt() from my tree:
>> >>
>> >>       static void maintenance_interrupt(int irq, void *dev_id, struct
>> >>       cpu_user_regs *regs)
>> >>       {
>> >>           int i = 0, virq, pirq;
>> >>           uint32_t lr;
>> >>           struct vcpu *v = current;
>> >>           uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);
>> >>
>> >>           while ((i = find_next_bit((const long unsigned int *) &eisr,
>> >>                                     64, i)) < 64) {
>> >>               struct pending_irq *p, *n;
>> >>               int cpu, eoi;
>> >>
>> >>               cpu = -1;
>> >>               eoi = 0;
>> >>
>> >>               spin_lock_irq(&gic.lock);
>> >>               lr = GICH[GICH_LR + i];
>> >>               virq = lr & GICH_LR_VIRTUAL_MASK;
>> >>
>> >>               p = irq_to_pending(v, virq);
>> >>               if ( p->desc != NULL ) {
>> >>                   p->desc->status &= ~IRQ_INPROGRESS;
>> >>                   /* Assume only one pcpu needs to EOI the irq */
>> >>                   cpu = p->desc->arch.eoi_cpu;
>> >>                   eoi = 1;
>> >>                   pirq = p->desc->irq;
>> >>               }
>> >>               if ( !atomic_dec_and_test(&p->inflight_cnt) )
>> >>               {
>> >>                   /* Physical IRQ can't be reinject */
>> >>                   WARN_ON(p->desc != NULL);
>> >>                   gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
>> >>                   spin_unlock_irq(&gic.lock);
>> >>                   i++;
>> >>                   continue;
>> >>               }
>> >>
>> >>               GICH[GICH_LR + i] = 0;
>> >>               clear_bit(i, &this_cpu(lr_mask));
>> >>
>> >>               if ( !list_empty(&v->arch.vgic.lr_pending) ) {
>> >>                   n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n), lr_queue);
>> >>                   gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
>> >>                   list_del_init(&n->lr_queue);
>> >>                   set_bit(i, &this_cpu(lr_mask));
>> >>               } else {
>> >>                   gic_inject_irq_stop();
>> >>               }
>> >>               spin_unlock_irq(&gic.lock);
>> >>
>> >>               spin_lock_irq(&v->arch.vgic.lock);
>> >>               list_del_init(&p->inflight);
>> >>               spin_unlock_irq(&v->arch.vgic.lock);
>> >>
>> >>               if ( eoi ) {
>> >>                   /* this is not racy because we can't receive another irq of the
>> >>                    * same type until we EOI it.  */
>> >>                   if ( cpu == smp_processor_id() )
>> >>                       gic_irq_eoi((void*)(uintptr_t)pirq);
>> >>                   else
>> >>                       on_selected_cpus(cpumask_of(cpu),
>> >>                                        gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
>> >>               }
>> >>
>> >>               i++;
>> >>           }
>> >>       }
>> >>
>> >>
>> >>       Oleksandr Tyshchenko | Embedded Developer
>> >>       GlobalLogic
>> >>
>> >>
>> >>
>>
>>
>>
>> --
>>
>> Name | Title
>> GlobalLogic
>> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
>> www.globallogic.com
>>
>> http://www.globallogic.com/email_disclaimer.txt
>>



-- 

Name | Title
GlobalLogic
P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-30 16:10                     ` Oleksandr Tyshchenko
@ 2014-01-30 17:18                       ` Stefano Stabellini
  2014-01-30 19:54                         ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 48+ messages in thread
From: Stefano Stabellini @ 2014-01-30 17:18 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote:
> 1.2 I also have checked solution where on_selected_cpus call was moved
> out of the
> interrupt handler. Unfortunately, it doesn't work.
> 
> I almost immediately see next error:
> (XEN) Assertion 'this_cpu(eoi_irq) == NULL' failed, line 981, file gic.c
> (XEN) Xen BUG at gic.c:981
> (XEN) CPU1: Unexpected Trap: Undefined Instruction
> (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
> (XEN) CPU:    1
> (XEN) PC:     00241ee0 __bug+0x2c/0x44
> (XEN) CPSR:   2000015a MODE:Hypervisor
> (XEN)      R0: 0026770c R1: 00000000 R2: 3fd2fd00 R3: 00000fff
> (XEN)      R4: 00263248 R5: 00264384 R6: 000003d5 R7: 4003d000
> (XEN)      R8: 00000001 R9: 00000091 R10:00000000 R11:40037ebc R12:00000001
> (XEN) HYP: SP: 40037eb4 LR: 00241ee0
> (XEN)
> (XEN)   VTCR_EL2: 80002558
> (XEN)  VTTBR_EL2: 00010000deffc000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd187f
> (XEN)    HCR_EL2: 0000000000002835
> (XEN)  TTBR0_EL2: 00000000d2014000
> (XEN)
> (XEN)    ESR_EL2: 00000000
> (XEN)  HPFAR_EL2: 0000000000482110
> (XEN)      HDFAR: fa211f00
> (XEN)      HIFAR: 00000000
> (XEN)
> (XEN) Xen stack trace from sp=40037eb4:
> (XEN)    00000000 40037efc 00247e1c 002e6610 002e6610 002e6608 002e6608 00000001
> (XEN)    00000000 40015000 40017000 40005f60 40017014 40037f58 00000019 00000000
> (XEN)    40005f60 40037f24 00249068 00000009 00000019 00404000 40037f58 00000000
> (XEN)    00405000 00004680 002e7694 40037f4c 00248b80 00000000 c5b72000 00000091
> (XEN)    00000000 c700d4e0 c008477c 000000f1 00000001 40037f54 0024f6c0 40037f58
> (XEN)    00251a30 c700d4e0 00000001 c008477c 00000000 c5b72000 00000091 00000000
> (XEN)    c700d4e0 c008477c 000000f1 00000001 00000001 c5b72000 ffffffff 0000a923
> (XEN)    c0077ac4 60000193 00000000 b6eadaa0 c0578f40 c00138c0 c5b73f58 c036ab90
> (XEN)    c0578f4c c00136a0 c0578f58 c0013920 00000000 00000000 00000000 00000000
> (XEN)    00000000 00000000 00000000 80000010 60000193 a0000093 80000193 00000000
> (XEN)    00000000 0c41e00c 450c2880
> (XEN) Xen call trace:
> (XEN)    [<00241ee0>] __bug+0x2c/0x44 (PC)
> (XEN)    [<00241ee0>] __bug+0x2c/0x44 (LR)
> (XEN)    [<00247e1c>] maintenance_interrupt+0x2e8/0x328
> (XEN)    [<00249068>] do_IRQ+0x138/0x198
> (XEN)    [<00248b80>] gic_interrupt+0x58/0xc0
> (XEN)    [<0024f6c0>] do_trap_irq+0x10/0x14
> (XEN)    [<00251a30>] return_from_trap+0/0x4
> (XEN)

Are you seeing more than one interrupt being EOI'ed with a single
maintenance interrupt?
I didn't think it could be possible in practice.
If so, we might have to turn eoi_irq into a list or an array.


> 2. The "simultaneous cross-interrupts" issue doesn't occur if I use
> next solution:
> So, as result I don't see deadlock in on_selected_cpus()
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e6257a7..af96a31 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -776,8 +795,7 @@ int gic_route_irq_to_guest(struct domain *d, const
> struct dt_irq *irq,
> 
>      level = dt_irq_is_level_triggered(irq);
> 
> -    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
> -                           0xa0);
> +    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
> 
>      retval = __setup_irq(desc, irq->irq, action);
>      if (retval) {
> So, as result I don't see deadlock in on_selected_cpus().

As I stated before I think this is a good change to have in 4.4.


> But, rarely, I see deadlocks in other parts related to interrupts handling.
> As noted by Julien, I am using the old version of the interrupt patch series.
> I completely agree.
> 
> We are based on next XEN commit:
> 48249a1 libxl: Avoid realloc(,0) when libxl__xs_directory returns empty list
> 
> Also we have some patches, which we cherry-picked when we urgently needed them:
> 6bba1a3 xen/arm: Keep count of inflight interrupts
> 33a8aa9 xen/arm: Only enable physical IRQs when the guest asks
> b6a4e65 xen/arm: Rename gic_irq_{startup, shutdown} to gic_irq_{mask, unmask}
> 5dbe455 xen/arm: Don't reinject the IRQ if it's already in LRs
> 1438f03 xen/arm: Physical IRQ is not always equal to virtual IRQ
> 
> I have to apply next patches and check with them:
> 88eb95e xen/arm: disable a physical IRQ when the guest disables the
> corresponding IRQ
> a660ee3 xen/arm: implement gic_irq_enable and gic_irq_disable
> 1dc9556 xen/arm: do not add a second irq to the LRs if one is already present
> d16d511 xen/arm: track the state of guest IRQs
> 
> I'll report about the results. I hope to do it today.

I am looking forward to reading your report.
Cheers,

Stefano

> A lot of thanks to all.
> 
> On Thu, Jan 30, 2014 at 5:35 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > Given that we don't deactivate the interrupt (writing to GICC_DIR) until
> > the guest EOIs it, I can't understand how you manage to get a second
> > interrupt notifications before the guest EOIs the first one.
> >
> > Do you set GICC_CTL_EOI in GICC_CTLR?
> >
> > On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote:
> >> According to DT it is a level irq (DT_IRQ_TYPE_LEVEL_HIGH)
> >>
> >> On Thu, Jan 30, 2014 at 3:24 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> > Is it a level or an edge irq?
> >> >
> >> > On Wed, 29 Jan 2014, Julien Grall wrote:
> >> >> Hi,
> >> >>
> >> >> It's weird, physical IRQ should not be injected twice ...
> >> >> Were you able to print the IRQ number?
> >> >>
> >> >> In any case, you are using the old version of the interrupt patch series.
> >> >> Your new error may come of race condition in this code.
> >> >>
> >> >> Can you try to use a newest version?
> >> >>
> >> >> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" <oleksandr.tyshchenko@globallogic.com> wrote:
> >> >>       > Right, that's why changing it to cpumask_of(0) shouldn't make any
> >> >>       > difference for xen-unstable (it should make things clearer, if nothing
> >> >>       > else) but it should fix things for Oleksandr.
> >> >>
> >> >>       Unfortunately, it is not enough for stable work.
> >> >>
> >> >>       I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in
> >> >>       gic_route_irq_to_guest(). And as result, I don't see our situation
> >> >>       which cause to deadlock in on_selected_cpus function (expected).
> >> >>       But, hypervisor sometimes hangs somewhere else (I have not identified
> >> >>       yet where this is happening) or I sometimes see traps, like that:
> >> >>       ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
> >> >>
> >> >>       (XEN) CPU1: Unexpected Trap: Undefined Instruction
> >> >>       (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
> >> >>       (XEN) CPU:    1
> >> >>       (XEN) PC:     00242c1c __warn+0x20/0x28
> >> >>       (XEN) CPSR:   200001da MODE:Hypervisor
> >> >>       (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
> >> >>       (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
> >> >>       (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
> >> >>       (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
> >> >>       (XEN)
> >> >>       (XEN)   VTCR_EL2: 80002558
> >> >>       (XEN)  VTTBR_EL2: 00020000dec6a000
> >> >>       (XEN)
> >> >>       (XEN)  SCTLR_EL2: 30cd187f
> >> >>       (XEN)    HCR_EL2: 00000000000028b5
> >> >>       (XEN)  TTBR0_EL2: 00000000d2014000
> >> >>       (XEN)
> >> >>       (XEN)    ESR_EL2: 00000000
> >> >>       (XEN)  HPFAR_EL2: 0000000000482110
> >> >>       (XEN)      HDFAR: fa211190
> >> >>       (XEN)      HIFAR: 00000000
> >> >>       (XEN)
> >> >>       (XEN) Xen stack trace from sp=4bfd7eb4:
> >> >>       (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097 00000001
> >> >>       (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019 00000000
> >> >>       (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58 00000000
> >> >>       (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097 00000097
> >> >>       (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8 4bfd7f58
> >> >>       (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097 00000000
> >> >>       (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff b6efbca3
> >> >>       (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8 c007680c
> >> >>       (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000 00000000
> >> >>       (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193 00000000
> >> >>       (XEN)    ffeffbfe fedeefff fffd5ffe
> >> >>       (XEN) Xen call trace:
> >> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
> >> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
> >> >>       (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
> >> >>       (XEN)    [<00248e60>] do_IRQ+0x138/0x198
> >> >>       (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
> >> >>       (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
> >> >>       (XEN)    [<00251830>] return_from_trap+0/0x4
> >> >>       (XEN)
> >> >>
> >> >>       Also I am posting maintenance_interrupt() from my tree:
> >> >>
> >> >>       static void maintenance_interrupt(int irq, void *dev_id, struct
> >> >>       cpu_user_regs *regs)
> >> >>       {
> >> >>           int i = 0, virq, pirq;
> >> >>           uint32_t lr;
> >> >>           struct vcpu *v = current;
> >> >>           uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);
> >> >>
> >> >>           while ((i = find_next_bit((const long unsigned int *) &eisr,
> >> >>                                     64, i)) < 64) {
> >> >>               struct pending_irq *p, *n;
> >> >>               int cpu, eoi;
> >> >>
> >> >>               cpu = -1;
> >> >>               eoi = 0;
> >> >>
> >> >>               spin_lock_irq(&gic.lock);
> >> >>               lr = GICH[GICH_LR + i];
> >> >>               virq = lr & GICH_LR_VIRTUAL_MASK;
> >> >>
> >> >>               p = irq_to_pending(v, virq);
> >> >>               if ( p->desc != NULL ) {
> >> >>                   p->desc->status &= ~IRQ_INPROGRESS;
> >> >>                   /* Assume only one pcpu needs to EOI the irq */
> >> >>                   cpu = p->desc->arch.eoi_cpu;
> >> >>                   eoi = 1;
> >> >>                   pirq = p->desc->irq;
> >> >>               }
> >> >>               if ( !atomic_dec_and_test(&p->inflight_cnt) )
> >> >>               {
> >> >>                   /* Physical IRQ can't be reinject */
> >> >>                   WARN_ON(p->desc != NULL);
> >> >>                   gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
> >> >>                   spin_unlock_irq(&gic.lock);
> >> >>                   i++;
> >> >>                   continue;
> >> >>               }
> >> >>
> >> >>               GICH[GICH_LR + i] = 0;
> >> >>               clear_bit(i, &this_cpu(lr_mask));
> >> >>
> >> >>               if ( !list_empty(&v->arch.vgic.lr_pending) ) {
> >> >>                   n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n), lr_queue);
> >> >>                   gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
> >> >>                   list_del_init(&n->lr_queue);
> >> >>                   set_bit(i, &this_cpu(lr_mask));
> >> >>               } else {
> >> >>                   gic_inject_irq_stop();
> >> >>               }
> >> >>               spin_unlock_irq(&gic.lock);
> >> >>
> >> >>               spin_lock_irq(&v->arch.vgic.lock);
> >> >>               list_del_init(&p->inflight);
> >> >>               spin_unlock_irq(&v->arch.vgic.lock);
> >> >>
> >> >>               if ( eoi ) {
> >> >>                   /* this is not racy because we can't receive another irq of the
> >> >>                    * same type until we EOI it.  */
> >> >>                   if ( cpu == smp_processor_id() )
> >> >>                       gic_irq_eoi((void*)(uintptr_t)pirq);
> >> >>                   else
> >> >>                       on_selected_cpus(cpumask_of(cpu),
> >> >>                                        gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
> >> >>               }
> >> >>
> >> >>               i++;
> >> >>           }
> >> >>       }
> >> >>
> >> >>
> >> >>       Oleksandr Tyshchenko | Embedded Developer
> >> >>       GlobalLogic
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >>
> >> Name | Title
> >> GlobalLogic
> >> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
> >> www.globallogic.com
> >>
> >> http://www.globallogic.com/email_disclaimer.txt
> >>
> 
> 
> 
> -- 
> 
> Name | Title
> GlobalLogic
> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
> www.globallogic.com
> 
> http://www.globallogic.com/email_disclaimer.txt
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-30 17:18                       ` Stefano Stabellini
@ 2014-01-30 19:54                         ` Oleksandr Tyshchenko
  2014-01-30 21:47                           ` Julien Grall
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-30 19:54 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

I moved to 4.4.0-rc1 which already has necessary irq patches.

And applied only one patch "cpumask_of(0) in gic_route_irq_to_guest".
I see that Hypervisor hangs very often. Unfortunately, now I don't
have debugger to localize part of code.
So, I have to use prints and it may takes some time(

On Thu, Jan 30, 2014 at 7:18 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote:
>> 1.2 I also have checked solution where on_selected_cpus call was moved
>> out of the
>> interrupt handler. Unfortunately, it doesn't work.
>>
>> I almost immediately see next error:
>> (XEN) Assertion 'this_cpu(eoi_irq) == NULL' failed, line 981, file gic.c
>> (XEN) Xen BUG at gic.c:981
>> (XEN) CPU1: Unexpected Trap: Undefined Instruction
>> (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
>> (XEN) CPU:    1
>> (XEN) PC:     00241ee0 __bug+0x2c/0x44
>> (XEN) CPSR:   2000015a MODE:Hypervisor
>> (XEN)      R0: 0026770c R1: 00000000 R2: 3fd2fd00 R3: 00000fff
>> (XEN)      R4: 00263248 R5: 00264384 R6: 000003d5 R7: 4003d000
>> (XEN)      R8: 00000001 R9: 00000091 R10:00000000 R11:40037ebc R12:00000001
>> (XEN) HYP: SP: 40037eb4 LR: 00241ee0
>> (XEN)
>> (XEN)   VTCR_EL2: 80002558
>> (XEN)  VTTBR_EL2: 00010000deffc000
>> (XEN)
>> (XEN)  SCTLR_EL2: 30cd187f
>> (XEN)    HCR_EL2: 0000000000002835
>> (XEN)  TTBR0_EL2: 00000000d2014000
>> (XEN)
>> (XEN)    ESR_EL2: 00000000
>> (XEN)  HPFAR_EL2: 0000000000482110
>> (XEN)      HDFAR: fa211f00
>> (XEN)      HIFAR: 00000000
>> (XEN)
>> (XEN) Xen stack trace from sp=40037eb4:
>> (XEN)    00000000 40037efc 00247e1c 002e6610 002e6610 002e6608 002e6608 00000001
>> (XEN)    00000000 40015000 40017000 40005f60 40017014 40037f58 00000019 00000000
>> (XEN)    40005f60 40037f24 00249068 00000009 00000019 00404000 40037f58 00000000
>> (XEN)    00405000 00004680 002e7694 40037f4c 00248b80 00000000 c5b72000 00000091
>> (XEN)    00000000 c700d4e0 c008477c 000000f1 00000001 40037f54 0024f6c0 40037f58
>> (XEN)    00251a30 c700d4e0 00000001 c008477c 00000000 c5b72000 00000091 00000000
>> (XEN)    c700d4e0 c008477c 000000f1 00000001 00000001 c5b72000 ffffffff 0000a923
>> (XEN)    c0077ac4 60000193 00000000 b6eadaa0 c0578f40 c00138c0 c5b73f58 c036ab90
>> (XEN)    c0578f4c c00136a0 c0578f58 c0013920 00000000 00000000 00000000 00000000
>> (XEN)    00000000 00000000 00000000 80000010 60000193 a0000093 80000193 00000000
>> (XEN)    00000000 0c41e00c 450c2880
>> (XEN) Xen call trace:
>> (XEN)    [<00241ee0>] __bug+0x2c/0x44 (PC)
>> (XEN)    [<00241ee0>] __bug+0x2c/0x44 (LR)
>> (XEN)    [<00247e1c>] maintenance_interrupt+0x2e8/0x328
>> (XEN)    [<00249068>] do_IRQ+0x138/0x198
>> (XEN)    [<00248b80>] gic_interrupt+0x58/0xc0
>> (XEN)    [<0024f6c0>] do_trap_irq+0x10/0x14
>> (XEN)    [<00251a30>] return_from_trap+0/0x4
>> (XEN)
>
> Are you seeing more than one interrupt being EOI'ed with a single
> maintenance interrupt?
> I didn't think it could be possible in practice.
> If so, we might have to turn eoi_irq into a list or an array.
>
>
>> 2. The "simultaneous cross-interrupts" issue doesn't occur if I use
>> next solution:
>> So, as result I don't see deadlock in on_selected_cpus()
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index e6257a7..af96a31 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -776,8 +795,7 @@ int gic_route_irq_to_guest(struct domain *d, const
>> struct dt_irq *irq,
>>
>>      level = dt_irq_is_level_triggered(irq);
>>
>> -    gic_set_irq_properties(irq->irq, level, cpumask_of(smp_processor_id()),
>> -                           0xa0);
>> +    gic_set_irq_properties(irq->irq, level, cpumask_of(0), 0xa0);
>>
>>      retval = __setup_irq(desc, irq->irq, action);
>>      if (retval) {
>> So, as result I don't see deadlock in on_selected_cpus().
>
> As I stated before I think this is a good change to have in 4.4.
>
>
>> But, rarely, I see deadlocks in other parts related to interrupts handling.
>> As noted by Julien, I am using the old version of the interrupt patch series.
>> I completely agree.
>>
>> We are based on next XEN commit:
>> 48249a1 libxl: Avoid realloc(,0) when libxl__xs_directory returns empty list
>>
>> Also we have some patches, which we cherry-picked when we urgently needed them:
>> 6bba1a3 xen/arm: Keep count of inflight interrupts
>> 33a8aa9 xen/arm: Only enable physical IRQs when the guest asks
>> b6a4e65 xen/arm: Rename gic_irq_{startup, shutdown} to gic_irq_{mask, unmask}
>> 5dbe455 xen/arm: Don't reinject the IRQ if it's already in LRs
>> 1438f03 xen/arm: Physical IRQ is not always equal to virtual IRQ
>>
>> I have to apply next patches and check with them:
>> 88eb95e xen/arm: disable a physical IRQ when the guest disables the
>> corresponding IRQ
>> a660ee3 xen/arm: implement gic_irq_enable and gic_irq_disable
>> 1dc9556 xen/arm: do not add a second irq to the LRs if one is already present
>> d16d511 xen/arm: track the state of guest IRQs
>>
>> I'll report about the results. I hope to do it today.
>
> I am looking forward to reading your report.
> Cheers,
>
> Stefano
>
>> A lot of thanks to all.
>>
>> On Thu, Jan 30, 2014 at 5:35 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > Given that we don't deactivate the interrupt (writing to GICC_DIR) until
>> > the guest EOIs it, I can't understand how you manage to get a second
>> > interrupt notifications before the guest EOIs the first one.
>> >
>> > Do you set GICC_CTL_EOI in GICC_CTLR?
>> >
>> > On Thu, 30 Jan 2014, Oleksandr Tyshchenko wrote:
>> >> According to DT it is a level irq (DT_IRQ_TYPE_LEVEL_HIGH)
>> >>
>> >> On Thu, Jan 30, 2014 at 3:24 PM, Stefano Stabellini
>> >> <stefano.stabellini@eu.citrix.com> wrote:
>> >> > Is it a level or an edge irq?
>> >> >
>> >> > On Wed, 29 Jan 2014, Julien Grall wrote:
>> >> >> Hi,
>> >> >>
>> >> >> It's weird, physical IRQ should not be injected twice ...
>> >> >> Were you able to print the IRQ number?
>> >> >>
>> >> >> In any case, you are using the old version of the interrupt patch series.
>> >> >> Your new error may come of race condition in this code.
>> >> >>
>> >> >> Can you try to use a newest version?
>> >> >>
>> >> >> On 29 Jan 2014 18:40, "Oleksandr Tyshchenko" <oleksandr.tyshchenko@globallogic.com> wrote:
>> >> >>       > Right, that's why changing it to cpumask_of(0) shouldn't make any
>> >> >>       > difference for xen-unstable (it should make things clearer, if nothing
>> >> >>       > else) but it should fix things for Oleksandr.
>> >> >>
>> >> >>       Unfortunately, it is not enough for stable work.
>> >> >>
>> >> >>       I was tried to use cpumask_of(smp_processor_id()) instead of cpumask_of(0) in
>> >> >>       gic_route_irq_to_guest(). And as result, I don't see our situation
>> >> >>       which cause to deadlock in on_selected_cpus function (expected).
>> >> >>       But, hypervisor sometimes hangs somewhere else (I have not identified
>> >> >>       yet where this is happening) or I sometimes see traps, like that:
>> >> >>       ("WARN_ON(p->desc != NULL)" in maintenance_interrupt() leads to them)
>> >> >>
>> >> >>       (XEN) CPU1: Unexpected Trap: Undefined Instruction
>> >> >>       (XEN) ----[ Xen-4.4-unstable  arm32  debug=y  Not tainted ]----
>> >> >>       (XEN) CPU:    1
>> >> >>       (XEN) PC:     00242c1c __warn+0x20/0x28
>> >> >>       (XEN) CPSR:   200001da MODE:Hypervisor
>> >> >>       (XEN)      R0: 0026770c R1: 00000001 R2: 3fd2fd00 R3: 00000fff
>> >> >>       (XEN)      R4: 00406100 R5: 40020ee0 R6: 00000000 R7: 4bfdf000
>> >> >>       (XEN)      R8: 00000001 R9: 4bfd7ed0 R10:00000001 R11:4bfd7ebc R12:00000002
>> >> >>       (XEN) HYP: SP: 4bfd7eb4 LR: 00242c1c
>> >> >>       (XEN)
>> >> >>       (XEN)   VTCR_EL2: 80002558
>> >> >>       (XEN)  VTTBR_EL2: 00020000dec6a000
>> >> >>       (XEN)
>> >> >>       (XEN)  SCTLR_EL2: 30cd187f
>> >> >>       (XEN)    HCR_EL2: 00000000000028b5
>> >> >>       (XEN)  TTBR0_EL2: 00000000d2014000
>> >> >>       (XEN)
>> >> >>       (XEN)    ESR_EL2: 00000000
>> >> >>       (XEN)  HPFAR_EL2: 0000000000482110
>> >> >>       (XEN)      HDFAR: fa211190
>> >> >>       (XEN)      HIFAR: 00000000
>> >> >>       (XEN)
>> >> >>       (XEN) Xen stack trace from sp=4bfd7eb4:
>> >> >>       (XEN)    0026431c 4bfd7efc 00247a54 00000024 002e6608 002e6608 00000097 00000001
>> >> >>       (XEN)    00000000 4bfd7f54 40017000 40005f60 40017014 4bfd7f58 00000019 00000000
>> >> >>       (XEN)    40005f60 4bfd7f24 00248e60 00000009 00000019 00404000 4bfd7f58 00000000
>> >> >>       (XEN)    00405000 000045f0 002e7694 4bfd7f4c 00248978 c0079a90 00000097 00000097
>> >> >>       (XEN)    00000000 fa212000 ea80c900 00000001 c05b8a60 4bfd7f54 0024f4b8 4bfd7f58
>> >> >>       (XEN)    00251830 ea80c950 00000000 00000001 c0079a90 00000097 00000097 00000000
>> >> >>       (XEN)    fa212000 ea80c900 00000001 c05b8a60 00000000 e9879e3c ffffffff b6efbca3
>> >> >>       (XEN)    c03b29fc 60000193 9fffffe7 b6c0bbf0 c0607500 c03b3140 e9879eb8 c007680c
>> >> >>       (XEN)    c060750c c03b32c0 c0607518 c03b3360 00000000 00000000 00000000 00000000
>> >> >>       (XEN)    00000000 00000000 3ff6bebf a0000113 800b0193 800b0093 40000193 00000000
>> >> >>       (XEN)    ffeffbfe fedeefff fffd5ffe
>> >> >>       (XEN) Xen call trace:
>> >> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (PC)
>> >> >>       (XEN)    [<00242c1c>] __warn+0x20/0x28 (LR)
>> >> >>       (XEN)    [<00247a54>] maintenance_interrupt+0xfc/0x2f4
>> >> >>       (XEN)    [<00248e60>] do_IRQ+0x138/0x198
>> >> >>       (XEN)    [<00248978>] gic_interrupt+0x58/0xc0
>> >> >>       (XEN)    [<0024f4b8>] do_trap_irq+0x10/0x14
>> >> >>       (XEN)    [<00251830>] return_from_trap+0/0x4
>> >> >>       (XEN)
>> >> >>
>> >> >>       Also I am posting maintenance_interrupt() from my tree:
>> >> >>
>> >> >>       static void maintenance_interrupt(int irq, void *dev_id, struct
>> >> >>       cpu_user_regs *regs)
>> >> >>       {
>> >> >>           int i = 0, virq, pirq;
>> >> >>           uint32_t lr;
>> >> >>           struct vcpu *v = current;
>> >> >>           uint64_t eisr = GICH[GICH_EISR0] | (((uint64_t) GICH[GICH_EISR1]) << 32);
>> >> >>
>> >> >>           while ((i = find_next_bit((const long unsigned int *) &eisr,
>> >> >>                                     64, i)) < 64) {
>> >> >>               struct pending_irq *p, *n;
>> >> >>               int cpu, eoi;
>> >> >>
>> >> >>               cpu = -1;
>> >> >>               eoi = 0;
>> >> >>
>> >> >>               spin_lock_irq(&gic.lock);
>> >> >>               lr = GICH[GICH_LR + i];
>> >> >>               virq = lr & GICH_LR_VIRTUAL_MASK;
>> >> >>
>> >> >>               p = irq_to_pending(v, virq);
>> >> >>               if ( p->desc != NULL ) {
>> >> >>                   p->desc->status &= ~IRQ_INPROGRESS;
>> >> >>                   /* Assume only one pcpu needs to EOI the irq */
>> >> >>                   cpu = p->desc->arch.eoi_cpu;
>> >> >>                   eoi = 1;
>> >> >>                   pirq = p->desc->irq;
>> >> >>               }
>> >> >>               if ( !atomic_dec_and_test(&p->inflight_cnt) )
>> >> >>               {
>> >> >>                   /* Physical IRQ can't be reinject */
>> >> >>                   WARN_ON(p->desc != NULL);
>> >> >>                   gic_set_lr(i, p->irq, GICH_LR_PENDING, p->priority);
>> >> >>                   spin_unlock_irq(&gic.lock);
>> >> >>                   i++;
>> >> >>                   continue;
>> >> >>               }
>> >> >>
>> >> >>               GICH[GICH_LR + i] = 0;
>> >> >>               clear_bit(i, &this_cpu(lr_mask));
>> >> >>
>> >> >>               if ( !list_empty(&v->arch.vgic.lr_pending) ) {
>> >> >>                   n = list_entry(v->arch.vgic.lr_pending.next, typeof(*n), lr_queue);
>> >> >>                   gic_set_lr(i, n->irq, GICH_LR_PENDING, n->priority);
>> >> >>                   list_del_init(&n->lr_queue);
>> >> >>                   set_bit(i, &this_cpu(lr_mask));
>> >> >>               } else {
>> >> >>                   gic_inject_irq_stop();
>> >> >>               }
>> >> >>               spin_unlock_irq(&gic.lock);
>> >> >>
>> >> >>               spin_lock_irq(&v->arch.vgic.lock);
>> >> >>               list_del_init(&p->inflight);
>> >> >>               spin_unlock_irq(&v->arch.vgic.lock);
>> >> >>
>> >> >>               if ( eoi ) {
>> >> >>                   /* this is not racy because we can't receive another irq of the
>> >> >>                    * same type until we EOI it.  */
>> >> >>                   if ( cpu == smp_processor_id() )
>> >> >>                       gic_irq_eoi((void*)(uintptr_t)pirq);
>> >> >>                   else
>> >> >>                       on_selected_cpus(cpumask_of(cpu),
>> >> >>                                        gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
>> >> >>               }
>> >> >>
>> >> >>               i++;
>> >> >>           }
>> >> >>       }
>> >> >>
>> >> >>
>> >> >>       Oleksandr Tyshchenko | Embedded Developer
>> >> >>       GlobalLogic
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Name | Title
>> >> GlobalLogic
>> >> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
>> >> www.globallogic.com
>> >>
>> >> http://www.globallogic.com/email_disclaimer.txt
>> >>
>>
>>
>>
>> --
>>
>> Name | Title
>> GlobalLogic
>> P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
>> www.globallogic.com
>>
>> http://www.globallogic.com/email_disclaimer.txt
>>



-- 

Name | Title
GlobalLogic
P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-30 19:54                         ` Oleksandr Tyshchenko
@ 2014-01-30 21:47                           ` Julien Grall
  2014-01-31  1:57                             ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 48+ messages in thread
From: Julien Grall @ 2014-01-30 21:47 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Stefano Stabellini; +Cc: Ian Campbell, xen-devel

Hello,

On 30/01/14 19:54, Oleksandr Tyshchenko wrote:
> I moved to 4.4.0-rc1 which already has necessary irq patches.

Any specific reason to use 4.4.0-rc1 instead of 4.4.0-rc2? There is a 
bunch of fixes (which should not be related to your current bug) such as 
TLB issue, foreign mapping, and a first attempt to fix guest cache issue.

> And applied only one patch "cpumask_of(0) in gic_route_irq_to_guest".
> I see that Hypervisor hangs very often. Unfortunately, now I don't
> have debugger to localize part of code.
> So, I have to use prints and it may takes some time(

What do you mean by hang? Do you have any output from Xen? What do you 
run? Dom0 and a DomU?

Sincerely yours,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-30 21:47                           ` Julien Grall
@ 2014-01-31  1:57                             ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-31  1:57 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

There is no specific reason) But, as I found out we already have our
local tree based on 4.4.0-rc1.
All works on porting our local patches on top on 4.4.0-rc1 (resolving
conflicts, etc.) were done by my colleagues.
I saw that the range of commits that you pointed are present there.
And I just moved.

I mean that hypervivor stucks somewhere in the interrupt (gets into an
infinite loop trying to acquire lock, or waiting for event). And as
result nothing works. Of course we don't have any output from it
(Console is not working)
For example, as it happened in on_selected_cpus()

I run domU. We have operation system in domU with UI. After moving to
4.4.0-rc1, hypervivor began to hangs very often. I have not identified
yet where this is happening. And these hangs occur when I use
touchscreen (domU is running at this time). It is somehow depends on
touchscreen irq. I would even say, from "touchscreen interrupt rate".
I was trying to change interrupt priority and was doing other things,
but, I don't have any positive results.
At first I need to localize where deadlock happens.

On Thu, Jan 30, 2014 at 11:47 PM, Julien Grall <julien.grall@linaro.org> wrote:
> Hello,
>
>
> On 30/01/14 19:54, Oleksandr Tyshchenko wrote:
>>
>> I moved to 4.4.0-rc1 which already has necessary irq patches.
>
>
> Any specific reason to use 4.4.0-rc1 instead of 4.4.0-rc2? There is a bunch
> of fixes (which should not be related to your current bug) such as TLB
> issue, foreign mapping, and a first attempt to fix guest cache issue.
>
>
>> And applied only one patch "cpumask_of(0) in gic_route_irq_to_guest".
>> I see that Hypervisor hangs very often. Unfortunately, now I don't
>> have debugger to localize part of code.
>> So, I have to use prints and it may takes some time(
>
>
> What do you mean by hang? Do you have any output from Xen? What do you run?
> Dom0 and a DomU?
>
> Sincerely yours,
>
> --
> Julien Grall

-- 

Name | Title
GlobalLogic
P +x.xxx.xxx.xxxx  M +x.xxx.xxx.xxxx  S skype
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-28 19:25   ` Oleksandr Tyshchenko
  2014-01-29 10:56     ` Oleksandr Tyshchenko
@ 2014-01-29 13:12     ` Julien Grall
  2014-01-29 18:55       ` Oleksandr Tyshchenko
  1 sibling, 1 reply; 48+ messages in thread
From: Julien Grall @ 2014-01-29 13:12 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

Hello Oleksandr,

On 28/01/14 19:25, Oleksandr Tyshchenko wrote:

[..]

>
>     Do you pass-through PPIs to dom0?
>
> If I understand correctly that PPIs is irqs from 16 to 31.
> So yes, I do. I see timer's irqs and maintenance irq which routed to
> both CPUs.

This IRQs are used by Xen, therefore they are emulated for dom0 and 
domU. Xen won't EOI in maintenance_interrupt theses IRQs.

>
> And I have printed all irqs which fall to gic_route_irq_to_guest and
> gic_route_irq functions.
> ...
> (XEN) GIC initialization:
> (XEN)         gic_dist_addr=0000000048211000
> (XEN)         gic_cpu_addr=0000000048212000
> (XEN)         gic_hyp_addr=0000000048214000
> (XEN)         gic_vcpu_addr=0000000048216000
> (XEN)         gic_maintenance_irq=25
> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000001
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 104, cpu_mask: 00000001
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Allocated console ring of 16 KiB.
> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> (XEN) Bringing up CPU1
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000002
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000002
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000002
> (XEN)
> (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000002
> (XEN) CPU 1 booted.
> (XEN) Brought up 2 CPUs
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> (XEN)
> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 61, cpu: 0

[..]

> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 61, cpu: 1

Not related to this patch series, but is it normal that you passthrough 
the same interrupt both to dom0 and domU?

There is few other case like that.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix
  2014-01-29 13:12     ` Julien Grall
@ 2014-01-29 18:55       ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2014-01-29 18:55 UTC (permalink / raw)
  To: Julien Grall; +Cc: Stefano Stabellini, Ian Campbell, xen-devel

On Wed, Jan 29, 2014 at 3:12 PM, Julien Grall <julien.grall@linaro.org> wrote:
> Hello Oleksandr,
>
> On 28/01/14 19:25, Oleksandr Tyshchenko wrote:
>
> [..]
>
>
>>
>>     Do you pass-through PPIs to dom0?
>>
>> If I understand correctly that PPIs is irqs from 16 to 31.
>> So yes, I do. I see timer's irqs and maintenance irq which routed to
>> both CPUs.
>
>
> This IRQs are used by Xen, therefore they are emulated for dom0 and domU.
> Xen won't EOI in maintenance_interrupt theses IRQs.
>
>
>>
>> And I have printed all irqs which fall to gic_route_irq_to_guest and
>> gic_route_irq functions.
>> ...
>> (XEN) GIC initialization:
>> (XEN)         gic_dist_addr=0000000048211000
>> (XEN)         gic_cpu_addr=0000000048212000
>> (XEN)         gic_hyp_addr=0000000048214000
>> (XEN)         gic_vcpu_addr=0000000048216000
>> (XEN)         gic_maintenance_irq=25
>> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000001
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000001
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000001
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000001
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 104, cpu_mask: 00000001
>> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>> (XEN) Allocated console ring of 16 KiB.
>> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
>> (XEN) Bringing up CPU1
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 25, cpu_mask: 00000002
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 30, cpu_mask: 00000002
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 26, cpu_mask: 00000002
>> (XEN)
>> (XEN) >>>>> gic_route_irq: irq: 27, cpu_mask: 00000002
>> (XEN) CPU 1 booted.
>> (XEN) Brought up 2 CPUs
>> (XEN) *** LOADING DOMAIN 0 ***
>> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
>> (XEN)
>> (XEN) >>>>> gic_route_irq_to_guest: domid: 0, irq: 61, cpu: 0
>
>
> [..]
>
>
>> (XEN) >>>>> gic_route_irq_to_guest: domid: 1, irq: 61, cpu: 1
>
>
> Not related to this patch series, but is it normal that you passthrough the
> same interrupt both to dom0 and domU?
>
No, it isn't. Although, these interrupts are not used in both domains,
we need to make cleanup. Thank you for your attention.
> There is few other case like that.
>
> --
> Julien Grall

-- 
Oleksandr Tyshchenko | Embedded Developer
GlobalLogic

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2014-02-21 12:07 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-27 17:33 [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Oleksandr Tyshchenko
2014-01-27 17:33 ` [PATCH v1 1/2] xen/arm: Add return value to smp_call_function_interrupt function Oleksandr Tyshchenko
2014-01-27 18:28   ` Stefano Stabellini
2014-01-27 17:33 ` [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function Oleksandr Tyshchenko
2014-01-27 19:00   ` Stefano Stabellini
2014-01-28 10:03     ` Ian Campbell
2014-01-28 14:00       ` Stefano Stabellini
2014-01-28 15:05         ` Ian Campbell
2014-01-28 16:02           ` Stefano Stabellini
2014-01-28 16:12             ` Ian Campbell
2014-01-28 16:23               ` Stefano Stabellini
2014-01-28 13:58   ` Stefano Stabellini
2014-01-30 11:58     ` Oleksandr Tyshchenko
2014-01-27 17:40 ` [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Ian Campbell
2014-01-27 17:51 ` Julien Grall
2014-01-28 19:25   ` Oleksandr Tyshchenko
2014-01-29 10:56     ` Oleksandr Tyshchenko
2014-01-29 11:42       ` Stefano Stabellini
2014-01-29 11:46         ` Stefano Stabellini
2014-01-29 13:15           ` Julien Grall
2014-02-04 15:32           ` Julien Grall
2014-02-04 16:20         ` [PATCH] xen/arm: route irqs to cpu0 Stefano Stabellini
2014-02-04 16:32           ` Julien Grall
2014-02-04 16:56             ` Oleksandr Tyshchenko
2014-02-19 13:43           ` Julien Grall
2014-02-19 13:53             ` Ian Campbell
2014-02-19 14:15               ` George Dunlap
2014-02-20 14:52                 ` Stefano Stabellini
2014-02-21 11:12                   ` George Dunlap
2014-02-21 11:59                     ` Julien Grall
2014-02-21 12:07                       ` George Dunlap
2014-01-29 13:07       ` [PATCH v1 0/2] xen/arm: maintenance_interrupt SMP fix Julien Grall
2014-01-29 13:22         ` Stefano Stabellini
2014-01-29 18:40           ` Oleksandr Tyshchenko
2014-01-29 18:43             ` Oleksandr Tyshchenko
2014-01-29 18:49             ` Julien Grall
2014-01-29 19:54               ` Oleksandr Tyshchenko
2014-01-30  0:42                 ` Julien Grall
2014-01-30 13:24               ` Stefano Stabellini
2014-01-30 15:06                 ` Oleksandr Tyshchenko
2014-01-30 15:35                   ` Stefano Stabellini
2014-01-30 16:10                     ` Oleksandr Tyshchenko
2014-01-30 17:18                       ` Stefano Stabellini
2014-01-30 19:54                         ` Oleksandr Tyshchenko
2014-01-30 21:47                           ` Julien Grall
2014-01-31  1:57                             ` Oleksandr Tyshchenko
2014-01-29 13:12     ` Julien Grall
2014-01-29 18:55       ` Oleksandr Tyshchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).