[PATCH 0/3] misc cpumask adjustments

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] misc cpumask adjustments
@ 2016-12-08 15:54 Jan Beulich
  2016-12-08 16:00 ` [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-08 15:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

1: make tlbflush_filter()'s first parameter a pointer
2: VT-d: correct dma_msi_set_affinity()
3: x86: introduce and use scratch CPU mask

Signed-off-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer
  2016-12-08 15:54 [PATCH 0/3] misc cpumask adjustments Jan Beulich
@ 2016-12-08 16:00 ` Jan Beulich
  2016-12-08 16:26   ` Andrew Cooper
                     ` (2 more replies)
  2016-12-08 16:01 ` [PATCH 2/3] VT-d: correct dma_msi_set_affinity() Jan Beulich
  2016-12-08 16:02 ` [PATCH 3/3] x86: introduce and use scratch CPU mask Jan Beulich
  2 siblings, 3 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-08 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 3295 bytes --]

This brings it in line with most other functions dealing with CPU
masks. Convert both implementations to inline functions at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2514,7 +2514,7 @@ static int __get_page_type(struct page_i
                 cpumask_copy(&mask, d->domain_dirty_cpumask);
 
                 /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(mask, page->tlbflush_timestamp);
+                tlbflush_filter(&mask, page->tlbflush_timestamp);
 
                 if ( unlikely(!cpumask_empty(&mask)) &&
                      /* Shadow mode: track only writable pages. */
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1471,7 +1471,7 @@ mfn_t shadow_alloc(struct domain *d,
         /* Before we overwrite the old contents of this page,
          * we need to be sure that no TLB holds a pointer to it. */
         cpumask_copy(&mask, d->domain_dirty_cpumask);
-        tlbflush_filter(mask, sp->tlbflush_timestamp);
+        tlbflush_filter(&mask, sp->tlbflush_timestamp);
         if ( unlikely(!cpumask_empty(&mask)) )
         {
             perfc_incr(shadow_alloc_tlbflush);
--- a/xen/include/asm-arm/flushtlb.h
+++ b/xen/include/asm-arm/flushtlb.h
@@ -8,9 +8,7 @@
  * TLB since @page_timestamp.
  */
 /* XXX lazy implementation just doesn't clear anything.... */
-#define tlbflush_filter(mask, page_timestamp)                           \
-do {                                                                    \
-} while ( 0 )
+static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp) {}
 
 #define tlbflush_current_time()                 (0)
 
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -50,13 +50,14 @@ static inline int NEED_FLUSH(u32 cpu_sta
  * Filter the given set of CPUs, removing those that definitely flushed their
  * TLB since @page_timestamp.
  */
-#define tlbflush_filter(mask, page_timestamp)                           \
-do {                                                                    \
-    unsigned int cpu;                                                   \
-    for_each_cpu ( cpu, &(mask) )                                       \
-        if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) ) \
-            cpumask_clear_cpu(cpu, &(mask));                            \
-} while ( 0 )
+static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp)
+{
+    unsigned int cpu;
+
+    for_each_cpu ( cpu, mask )
+        if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) )
+            cpumask_clear_cpu(cpu, mask);
+}
 
 void new_tlbflush_clock_period(void);
 
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -588,9 +588,10 @@ static inline void accumulate_tlbflush(b
 
 static inline void filtered_flush_tlb_mask(uint32_t tlbflush_timestamp)
 {
-    cpumask_t mask = cpu_online_map;
+    cpumask_t mask;
 
-    tlbflush_filter(mask, tlbflush_timestamp);
+    cpumask_copy(&mask, &cpu_online_map);
+    tlbflush_filter(&mask, tlbflush_timestamp);
     if ( !cpumask_empty(&mask) )
     {
         perfc_incr(need_flush_tlb_flush);




[-- Attachment #2: tlbflush_filter-mask-ptr.patch --]
[-- Type: text/plain, Size: 3343 bytes --]

make tlbflush_filter()'s first parameter a pointer

This brings it in line with most other functions dealing with CPU
masks. Convert both implementations to inline functions at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2514,7 +2514,7 @@ static int __get_page_type(struct page_i
                 cpumask_copy(&mask, d->domain_dirty_cpumask);
 
                 /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(mask, page->tlbflush_timestamp);
+                tlbflush_filter(&mask, page->tlbflush_timestamp);
 
                 if ( unlikely(!cpumask_empty(&mask)) &&
                      /* Shadow mode: track only writable pages. */
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1471,7 +1471,7 @@ mfn_t shadow_alloc(struct domain *d,
         /* Before we overwrite the old contents of this page,
          * we need to be sure that no TLB holds a pointer to it. */
         cpumask_copy(&mask, d->domain_dirty_cpumask);
-        tlbflush_filter(mask, sp->tlbflush_timestamp);
+        tlbflush_filter(&mask, sp->tlbflush_timestamp);
         if ( unlikely(!cpumask_empty(&mask)) )
         {
             perfc_incr(shadow_alloc_tlbflush);
--- a/xen/include/asm-arm/flushtlb.h
+++ b/xen/include/asm-arm/flushtlb.h
@@ -8,9 +8,7 @@
  * TLB since @page_timestamp.
  */
 /* XXX lazy implementation just doesn't clear anything.... */
-#define tlbflush_filter(mask, page_timestamp)                           \
-do {                                                                    \
-} while ( 0 )
+static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp) {}
 
 #define tlbflush_current_time()                 (0)
 
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -50,13 +50,14 @@ static inline int NEED_FLUSH(u32 cpu_sta
  * Filter the given set of CPUs, removing those that definitely flushed their
  * TLB since @page_timestamp.
  */
-#define tlbflush_filter(mask, page_timestamp)                           \
-do {                                                                    \
-    unsigned int cpu;                                                   \
-    for_each_cpu ( cpu, &(mask) )                                       \
-        if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) ) \
-            cpumask_clear_cpu(cpu, &(mask));                            \
-} while ( 0 )
+static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp)
+{
+    unsigned int cpu;
+
+    for_each_cpu ( cpu, mask )
+        if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) )
+            cpumask_clear_cpu(cpu, mask);
+}
 
 void new_tlbflush_clock_period(void);
 
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -588,9 +588,10 @@ static inline void accumulate_tlbflush(b
 
 static inline void filtered_flush_tlb_mask(uint32_t tlbflush_timestamp)
 {
-    cpumask_t mask = cpu_online_map;
+    cpumask_t mask;
 
-    tlbflush_filter(mask, tlbflush_timestamp);
+    cpumask_copy(&mask, &cpu_online_map);
+    tlbflush_filter(&mask, tlbflush_timestamp);
     if ( !cpumask_empty(&mask) )
     {
         perfc_incr(need_flush_tlb_flush);

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-08 15:54 [PATCH 0/3] misc cpumask adjustments Jan Beulich
  2016-12-08 16:00 ` [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer Jan Beulich
@ 2016-12-08 16:01 ` Jan Beulich
  2016-12-08 17:33   ` Andrew Cooper
  2016-12-08 16:02 ` [PATCH 3/3] x86: introduce and use scratch CPU mask Jan Beulich
  2 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2016-12-08 16:01 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Kevin Tian, Feng Wu

[-- Attachment #1: Type: text/plain, Size: 3500 bytes --]

That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
("amd iommu: use base platform MSI implementation") introducing the use
of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
at least in theory, be called in interrupt context, and hence the use
of that scratch variable is not correct.

Since the function overwrites the destination information anyway,
allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
use of that scratch variable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -160,39 +160,37 @@ static bool_t msix_memory_decoded(const
  */
 void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg *msg)
 {
-    unsigned dest;
-
     memset(msg, 0, sizeof(*msg));
-    if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
+
+    if ( vector < FIRST_DYNAMIC_VECTOR )
         return;
 
-    if ( vector )
+    if ( cpu_mask )
     {
         cpumask_t *mask = this_cpu(scratch_mask);
 
-        cpumask_and(mask, cpu_mask, &cpu_online_map);
-        dest = cpu_mask_to_apicid(mask);
+        if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
+            return;
 
-        msg->address_hi = MSI_ADDR_BASE_HI;
-        msg->address_lo =
-            MSI_ADDR_BASE_LO |
-            ((INT_DEST_MODE == 0) ?
-             MSI_ADDR_DESTMODE_PHYS:
-             MSI_ADDR_DESTMODE_LOGIC) |
-            ((INT_DELIVERY_MODE != dest_LowestPrio) ?
-             MSI_ADDR_REDIRECTION_CPU:
-             MSI_ADDR_REDIRECTION_LOWPRI) |
-            MSI_ADDR_DEST_ID(dest);
-        msg->dest32 = dest;
-
-        msg->data =
-            MSI_DATA_TRIGGER_EDGE |
-            MSI_DATA_LEVEL_ASSERT |
-            ((INT_DELIVERY_MODE != dest_LowestPrio) ?
-             MSI_DATA_DELIVERY_FIXED:
-             MSI_DATA_DELIVERY_LOWPRI) |
-            MSI_DATA_VECTOR(vector);
+        cpumask_and(mask, cpu_mask, &cpu_online_map);
+        msg->dest32 = cpu_mask_to_apicid(mask);
     }
+
+    msg->address_hi = MSI_ADDR_BASE_HI;
+    msg->address_lo = MSI_ADDR_BASE_LO |
+                      (INT_DEST_MODE ? MSI_ADDR_DESTMODE_LOGIC
+                                     : MSI_ADDR_DESTMODE_PHYS) |
+                      ((INT_DELIVERY_MODE != dest_LowestPrio)
+                       ? MSI_ADDR_REDIRECTION_CPU
+                       : MSI_ADDR_REDIRECTION_LOWPRI) |
+                      MSI_ADDR_DEST_ID(msg->dest32);
+
+    msg->data = MSI_DATA_TRIGGER_EDGE |
+                MSI_DATA_LEVEL_ASSERT |
+                ((INT_DELIVERY_MODE != dest_LowestPrio)
+                 ? MSI_DATA_DELIVERY_FIXED
+                 : MSI_DATA_DELIVERY_LOWPRI) |
+                MSI_DATA_VECTOR(vector);
 }
 
 static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1085,11 +1085,10 @@ static void dma_msi_set_affinity(struct
         return;
     }
 
-    msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg);
-    /* Are these overrides really needed? */
+    msi_compose_msg(desc->arch.vector, NULL, &msg);
     if (x2apic_enabled)
         msg.address_hi = dest & 0xFFFFFF00;
-    msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
+    ASSERT(!(msg.address_lo & MSI_ADDR_DEST_ID_MASK));
     msg.address_lo |= MSI_ADDR_DEST_ID(dest);
     iommu->msi.msg = msg;
 




[-- Attachment #2: VT-d-refine-83cd2038fe.patch --]
[-- Type: text/plain, Size: 3534 bytes --]

VT-d: correct dma_msi_set_affinity()

That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
("amd iommu: use base platform MSI implementation") introducing the use
of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
at least in theory, be called in interrupt context, and hence the use
of that scratch variable is not correct.

Since the function overwrites the destination information anyway,
allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
use of that scratch variable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -160,39 +160,37 @@ static bool_t msix_memory_decoded(const
  */
 void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg *msg)
 {
-    unsigned dest;
-
     memset(msg, 0, sizeof(*msg));
-    if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
+
+    if ( vector < FIRST_DYNAMIC_VECTOR )
         return;
 
-    if ( vector )
+    if ( cpu_mask )
     {
         cpumask_t *mask = this_cpu(scratch_mask);
 
-        cpumask_and(mask, cpu_mask, &cpu_online_map);
-        dest = cpu_mask_to_apicid(mask);
+        if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
+            return;
 
-        msg->address_hi = MSI_ADDR_BASE_HI;
-        msg->address_lo =
-            MSI_ADDR_BASE_LO |
-            ((INT_DEST_MODE == 0) ?
-             MSI_ADDR_DESTMODE_PHYS:
-             MSI_ADDR_DESTMODE_LOGIC) |
-            ((INT_DELIVERY_MODE != dest_LowestPrio) ?
-             MSI_ADDR_REDIRECTION_CPU:
-             MSI_ADDR_REDIRECTION_LOWPRI) |
-            MSI_ADDR_DEST_ID(dest);
-        msg->dest32 = dest;
-
-        msg->data =
-            MSI_DATA_TRIGGER_EDGE |
-            MSI_DATA_LEVEL_ASSERT |
-            ((INT_DELIVERY_MODE != dest_LowestPrio) ?
-             MSI_DATA_DELIVERY_FIXED:
-             MSI_DATA_DELIVERY_LOWPRI) |
-            MSI_DATA_VECTOR(vector);
+        cpumask_and(mask, cpu_mask, &cpu_online_map);
+        msg->dest32 = cpu_mask_to_apicid(mask);
     }
+
+    msg->address_hi = MSI_ADDR_BASE_HI;
+    msg->address_lo = MSI_ADDR_BASE_LO |
+                      (INT_DEST_MODE ? MSI_ADDR_DESTMODE_LOGIC
+                                     : MSI_ADDR_DESTMODE_PHYS) |
+                      ((INT_DELIVERY_MODE != dest_LowestPrio)
+                       ? MSI_ADDR_REDIRECTION_CPU
+                       : MSI_ADDR_REDIRECTION_LOWPRI) |
+                      MSI_ADDR_DEST_ID(msg->dest32);
+
+    msg->data = MSI_DATA_TRIGGER_EDGE |
+                MSI_DATA_LEVEL_ASSERT |
+                ((INT_DELIVERY_MODE != dest_LowestPrio)
+                 ? MSI_DATA_DELIVERY_FIXED
+                 : MSI_DATA_DELIVERY_LOWPRI) |
+                MSI_DATA_VECTOR(vector);
 }
 
 static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1085,11 +1085,10 @@ static void dma_msi_set_affinity(struct
         return;
     }
 
-    msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg);
-    /* Are these overrides really needed? */
+    msi_compose_msg(desc->arch.vector, NULL, &msg);
     if (x2apic_enabled)
         msg.address_hi = dest & 0xFFFFFF00;
-    msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
+    ASSERT(!(msg.address_lo & MSI_ADDR_DEST_ID_MASK));
     msg.address_lo |= MSI_ADDR_DEST_ID(dest);
     iommu->msi.msg = msg;
 

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/3] x86: introduce and use scratch CPU mask
  2016-12-08 15:54 [PATCH 0/3] misc cpumask adjustments Jan Beulich
  2016-12-08 16:00 ` [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer Jan Beulich
  2016-12-08 16:01 ` [PATCH 2/3] VT-d: correct dma_msi_set_affinity() Jan Beulich
@ 2016-12-08 16:02 ` Jan Beulich
  2016-12-08 17:51   ` Andrew Cooper
  2 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2016-12-08 16:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 9712 bytes --]

__get_page_type(), so far using an on-stack CPU mask variable, is
involved in the recursion when e.g. pinning page tables. This means
there may be up two five instances of the function active at a time,
implying five instances of the (up to 512 bytes large) CPU mask
variable. With an IRQ happening at the deepest point of the stack, and
with send_guest_pirq() being called from there (leading to vcpu_kick()
-> ... -> csched_vcpu_wake() -> __runq_tickle() ->
cpumask_raise_softirq(), the last two of which also have CPU mask
variables on their stacks), this has been observed to cause a stack
overflow with a 4095-pCPU build.

Introduce a per-CPU variable instead, which can then be used by any
code never running in IRQ context.

The mask can then also be used by other MMU code as well as by
msi_compose_msg() (and quite likely we'll find further uses down the
road).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2477,6 +2477,7 @@ static int __get_page_type(struct page_i
     int rc = 0, iommu_ret = 0;
 
     ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
+    ASSERT(!in_irq());
 
     for ( ; ; )
     {
@@ -2509,20 +2510,21 @@ static int __get_page_type(struct page_i
                  * may be unnecessary (e.g., page was GDT/LDT) but those 
                  * circumstances should be very rare.
                  */
-                cpumask_t mask;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
 
-                cpumask_copy(&mask, d->domain_dirty_cpumask);
+                BUG_ON(in_irq());
+                cpumask_copy(mask, d->domain_dirty_cpumask);
 
                 /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(&mask, page->tlbflush_timestamp);
+                tlbflush_filter(mask, page->tlbflush_timestamp);
 
-                if ( unlikely(!cpumask_empty(&mask)) &&
+                if ( unlikely(!cpumask_empty(mask)) &&
                      /* Shadow mode: track only writable pages. */
                      (!shadow_mode_enabled(page_get_owner(page)) ||
                       ((nx & PGT_type_mask) == PGT_writable_page)) )
                 {
                     perfc_incr(need_flush_tlb_flush);
-                    flush_tlb_mask(&mask);
+                    flush_tlb_mask(mask);
                 }
 
                 /* We lose existing type and validity. */
@@ -3404,22 +3406,22 @@ long do_mmuext_op(
         case MMUEXT_TLB_FLUSH_MULTI:
         case MMUEXT_INVLPG_MULTI:
         {
-            cpumask_t pmask;
+            cpumask_t *mask = this_cpu(scratch_cpumask);
 
             if ( unlikely(d != pg_owner) )
                 rc = -EPERM;
             else if ( unlikely(vcpumask_to_pcpumask(d,
                                    guest_handle_to_param(op.arg2.vcpumask,
                                                          const_void),
-                                   &pmask)) )
+                                   mask)) )
                 rc = -EINVAL;
             if ( unlikely(rc) )
                 break;
 
             if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
-                flush_tlb_mask(&pmask);
+                flush_tlb_mask(mask);
             else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(&pmask, op.arg1.linear_addr);
+                flush_tlb_one_mask(mask, op.arg1.linear_addr);
             break;
         }
 
@@ -3457,14 +3459,14 @@ long do_mmuext_op(
             else if ( likely(cache_flush_permitted(d)) )
             {
                 unsigned int cpu;
-                cpumask_t mask;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
 
-                cpumask_clear(&mask);
+                cpumask_clear(mask);
                 for_each_online_cpu(cpu)
-                    if ( !cpumask_intersects(&mask,
+                    if ( !cpumask_intersects(mask,
                                              per_cpu(cpu_sibling_mask, cpu)) )
-                        __cpumask_set_cpu(cpu, &mask);
-                flush_mask(&mask, FLUSH_CACHE);
+                        __cpumask_set_cpu(cpu, mask);
+                flush_mask(mask, FLUSH_CACHE);
             }
             else
             {
@@ -4460,7 +4462,7 @@ static int __do_update_va_mapping(
     struct page_info *gl1pg;
     l1_pgentry_t  *pl1e;
     unsigned long  bmap_ptr, gl1mfn;
-    cpumask_t      pmask;
+    cpumask_t     *mask = NULL;
     int            rc;
 
     perfc_incr(calls_to_update_va);
@@ -4506,15 +4508,17 @@ static int __do_update_va_mapping(
             flush_tlb_local();
             break;
         case UVMF_ALL:
-            flush_tlb_mask(d->domain_dirty_cpumask);
+            mask = d->domain_dirty_cpumask;
             break;
         default:
+            mask = this_cpu(scratch_cpumask);
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
-                                      &pmask);
-            flush_tlb_mask(&pmask);
+                                      mask);
             break;
         }
+        if ( mask )
+            flush_tlb_mask(mask);
         break;
 
     case UVMF_INVLPG:
@@ -4524,15 +4528,17 @@ static int __do_update_va_mapping(
             paging_invlpg(v, va);
             break;
         case UVMF_ALL:
-            flush_tlb_one_mask(d->domain_dirty_cpumask, va);
+            mask = d->domain_dirty_cpumask;
             break;
         default:
+            mask = this_cpu(scratch_cpumask);
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
-                                      &pmask);
-            flush_tlb_one_mask(&pmask, va);
+                                      mask);
             break;
         }
+        if ( mask )
+            flush_tlb_one_mask(mask, va);
         break;
     }
 
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -40,7 +40,6 @@ static void __pci_disable_msix(struct ms
 /* bitmap indicate which fixed map is free */
 static DEFINE_SPINLOCK(msix_fixmap_lock);
 static DECLARE_BITMAP(msix_fixmap_pages, FIX_MSIX_MAX_PAGES);
-static DEFINE_PER_CPU(cpumask_var_t, scratch_mask);
 
 static int msix_fixmap_alloc(void)
 {
@@ -167,7 +166,7 @@ void msi_compose_msg(unsigned vector, co
 
     if ( cpu_mask )
     {
-        cpumask_t *mask = this_cpu(scratch_mask);
+        cpumask_t *mask = this_cpu(scratch_cpumask);
 
         if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
             return;
@@ -1458,43 +1457,12 @@ int pci_restore_msi_state(struct pci_dev
     return 0;
 }
 
-static int msi_cpu_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-
-    switch ( action )
-    {
-    case CPU_UP_PREPARE:
-        if ( !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
-            return notifier_from_errno(ENOMEM);
-        break;
-    case CPU_UP_CANCELED:
-    case CPU_DEAD:
-        free_cpumask_var(per_cpu(scratch_mask, cpu));
-        break;
-    default:
-        break;
-    }
-
-    return NOTIFY_DONE;
-}
-
-static struct notifier_block msi_cpu_nfb = {
-    .notifier_call = msi_cpu_callback
-};
-
 void __init early_msi_init(void)
 {
     if ( use_msi < 0 )
         use_msi = !(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI);
     if ( !use_msi )
         return;
-
-    register_cpu_notifier(&msi_cpu_nfb);
-    if ( msi_cpu_callback(&msi_cpu_nfb, CPU_UP_PREPARE, NULL) &
-         NOTIFY_STOP_MASK )
-        BUG();
 }
 
 static void dump_msi(unsigned char key)
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -56,6 +56,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t
 /* representing HT and core siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
+
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
@@ -646,6 +648,7 @@ static void cpu_smpboot_free(unsigned in
 
     free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
     free_cpumask_var(per_cpu(cpu_core_mask, cpu));
+    free_cpumask_var(per_cpu(scratch_cpumask, cpu));
 
     if ( per_cpu(stubs.addr, cpu) )
     {
@@ -734,7 +737,8 @@ static int cpu_smpboot_alloc(unsigned in
         goto oom;
 
     if ( zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
-         zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) )
+         zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
+         alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) )
         return 0;
 
  oom:
@@ -791,7 +795,8 @@ void __init smp_prepare_cpus(unsigned in
         panic("No memory for socket CPU siblings map");
 
     if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, 0)) ||
-         !zalloc_cpumask_var(&per_cpu(cpu_core_mask, 0)) )
+         !zalloc_cpumask_var(&per_cpu(cpu_core_mask, 0)) ||
+         !alloc_cpumask_var(&per_cpu(scratch_cpumask, 0)) )
         panic("No memory for boot CPU sibling/core maps");
 
     set_cpu_sibling_map(0);
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -25,6 +25,7 @@
  */
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
+DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
 void smp_send_nmi_allbutself(void);
 



[-- Attachment #2: x86-scratch-cpumask.patch --]
[-- Type: text/plain, Size: 9751 bytes --]

x86: introduce and use scratch CPU mask

__get_page_type(), so far using an on-stack CPU mask variable, is
involved in the recursion when e.g. pinning page tables. This means
there may be up two five instances of the function active at a time,
implying five instances of the (up to 512 bytes large) CPU mask
variable. With an IRQ happening at the deepest point of the stack, and
with send_guest_pirq() being called from there (leading to vcpu_kick()
-> ... -> csched_vcpu_wake() -> __runq_tickle() ->
cpumask_raise_softirq(), the last two of which also have CPU mask
variables on their stacks), this has been observed to cause a stack
overflow with a 4095-pCPU build.

Introduce a per-CPU variable instead, which can then be used by any
code never running in IRQ context.

The mask can then also be used by other MMU code as well as by
msi_compose_msg() (and quite likely we'll find further uses down the
road).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2477,6 +2477,7 @@ static int __get_page_type(struct page_i
     int rc = 0, iommu_ret = 0;
 
     ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
+    ASSERT(!in_irq());
 
     for ( ; ; )
     {
@@ -2509,20 +2510,21 @@ static int __get_page_type(struct page_i
                  * may be unnecessary (e.g., page was GDT/LDT) but those 
                  * circumstances should be very rare.
                  */
-                cpumask_t mask;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
 
-                cpumask_copy(&mask, d->domain_dirty_cpumask);
+                BUG_ON(in_irq());
+                cpumask_copy(mask, d->domain_dirty_cpumask);
 
                 /* Don't flush if the timestamp is old enough */
-                tlbflush_filter(&mask, page->tlbflush_timestamp);
+                tlbflush_filter(mask, page->tlbflush_timestamp);
 
-                if ( unlikely(!cpumask_empty(&mask)) &&
+                if ( unlikely(!cpumask_empty(mask)) &&
                      /* Shadow mode: track only writable pages. */
                      (!shadow_mode_enabled(page_get_owner(page)) ||
                       ((nx & PGT_type_mask) == PGT_writable_page)) )
                 {
                     perfc_incr(need_flush_tlb_flush);
-                    flush_tlb_mask(&mask);
+                    flush_tlb_mask(mask);
                 }
 
                 /* We lose existing type and validity. */
@@ -3404,22 +3406,22 @@ long do_mmuext_op(
         case MMUEXT_TLB_FLUSH_MULTI:
         case MMUEXT_INVLPG_MULTI:
         {
-            cpumask_t pmask;
+            cpumask_t *mask = this_cpu(scratch_cpumask);
 
             if ( unlikely(d != pg_owner) )
                 rc = -EPERM;
             else if ( unlikely(vcpumask_to_pcpumask(d,
                                    guest_handle_to_param(op.arg2.vcpumask,
                                                          const_void),
-                                   &pmask)) )
+                                   mask)) )
                 rc = -EINVAL;
             if ( unlikely(rc) )
                 break;
 
             if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
-                flush_tlb_mask(&pmask);
+                flush_tlb_mask(mask);
             else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(&pmask, op.arg1.linear_addr);
+                flush_tlb_one_mask(mask, op.arg1.linear_addr);
             break;
         }
 
@@ -3457,14 +3459,14 @@ long do_mmuext_op(
             else if ( likely(cache_flush_permitted(d)) )
             {
                 unsigned int cpu;
-                cpumask_t mask;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
 
-                cpumask_clear(&mask);
+                cpumask_clear(mask);
                 for_each_online_cpu(cpu)
-                    if ( !cpumask_intersects(&mask,
+                    if ( !cpumask_intersects(mask,
                                              per_cpu(cpu_sibling_mask, cpu)) )
-                        __cpumask_set_cpu(cpu, &mask);
-                flush_mask(&mask, FLUSH_CACHE);
+                        __cpumask_set_cpu(cpu, mask);
+                flush_mask(mask, FLUSH_CACHE);
             }
             else
             {
@@ -4460,7 +4462,7 @@ static int __do_update_va_mapping(
     struct page_info *gl1pg;
     l1_pgentry_t  *pl1e;
     unsigned long  bmap_ptr, gl1mfn;
-    cpumask_t      pmask;
+    cpumask_t     *mask = NULL;
     int            rc;
 
     perfc_incr(calls_to_update_va);
@@ -4506,15 +4508,17 @@ static int __do_update_va_mapping(
             flush_tlb_local();
             break;
         case UVMF_ALL:
-            flush_tlb_mask(d->domain_dirty_cpumask);
+            mask = d->domain_dirty_cpumask;
             break;
         default:
+            mask = this_cpu(scratch_cpumask);
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
-                                      &pmask);
-            flush_tlb_mask(&pmask);
+                                      mask);
             break;
         }
+        if ( mask )
+            flush_tlb_mask(mask);
         break;
 
     case UVMF_INVLPG:
@@ -4524,15 +4528,17 @@ static int __do_update_va_mapping(
             paging_invlpg(v, va);
             break;
         case UVMF_ALL:
-            flush_tlb_one_mask(d->domain_dirty_cpumask, va);
+            mask = d->domain_dirty_cpumask;
             break;
         default:
+            mask = this_cpu(scratch_cpumask);
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
-                                      &pmask);
-            flush_tlb_one_mask(&pmask, va);
+                                      mask);
             break;
         }
+        if ( mask )
+            flush_tlb_one_mask(mask, va);
         break;
     }
 
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -40,7 +40,6 @@ static void __pci_disable_msix(struct ms
 /* bitmap indicate which fixed map is free */
 static DEFINE_SPINLOCK(msix_fixmap_lock);
 static DECLARE_BITMAP(msix_fixmap_pages, FIX_MSIX_MAX_PAGES);
-static DEFINE_PER_CPU(cpumask_var_t, scratch_mask);
 
 static int msix_fixmap_alloc(void)
 {
@@ -167,7 +166,7 @@ void msi_compose_msg(unsigned vector, co
 
     if ( cpu_mask )
     {
-        cpumask_t *mask = this_cpu(scratch_mask);
+        cpumask_t *mask = this_cpu(scratch_cpumask);
 
         if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
             return;
@@ -1458,43 +1457,12 @@ int pci_restore_msi_state(struct pci_dev
     return 0;
 }
 
-static int msi_cpu_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-
-    switch ( action )
-    {
-    case CPU_UP_PREPARE:
-        if ( !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
-            return notifier_from_errno(ENOMEM);
-        break;
-    case CPU_UP_CANCELED:
-    case CPU_DEAD:
-        free_cpumask_var(per_cpu(scratch_mask, cpu));
-        break;
-    default:
-        break;
-    }
-
-    return NOTIFY_DONE;
-}
-
-static struct notifier_block msi_cpu_nfb = {
-    .notifier_call = msi_cpu_callback
-};
-
 void __init early_msi_init(void)
 {
     if ( use_msi < 0 )
         use_msi = !(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI);
     if ( !use_msi )
         return;
-
-    register_cpu_notifier(&msi_cpu_nfb);
-    if ( msi_cpu_callback(&msi_cpu_nfb, CPU_UP_PREPARE, NULL) &
-         NOTIFY_STOP_MASK )
-        BUG();
 }
 
 static void dump_msi(unsigned char key)
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -56,6 +56,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t
 /* representing HT and core siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
+
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
@@ -646,6 +648,7 @@ static void cpu_smpboot_free(unsigned in
 
     free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
     free_cpumask_var(per_cpu(cpu_core_mask, cpu));
+    free_cpumask_var(per_cpu(scratch_cpumask, cpu));
 
     if ( per_cpu(stubs.addr, cpu) )
     {
@@ -734,7 +737,8 @@ static int cpu_smpboot_alloc(unsigned in
         goto oom;
 
     if ( zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
-         zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) )
+         zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
+         alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) )
         return 0;
 
  oom:
@@ -791,7 +795,8 @@ void __init smp_prepare_cpus(unsigned in
         panic("No memory for socket CPU siblings map");
 
     if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, 0)) ||
-         !zalloc_cpumask_var(&per_cpu(cpu_core_mask, 0)) )
+         !zalloc_cpumask_var(&per_cpu(cpu_core_mask, 0)) ||
+         !alloc_cpumask_var(&per_cpu(scratch_cpumask, 0)) )
         panic("No memory for boot CPU sibling/core maps");
 
     set_cpu_sibling_map(0);
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -25,6 +25,7 @@
  */
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
+DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
 void smp_send_nmi_allbutself(void);
 

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer
  2016-12-08 16:00 ` [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer Jan Beulich
@ 2016-12-08 16:26   ` Andrew Cooper
  2016-12-09 10:39   ` Wei Liu
  2016-12-09 18:05   ` Julien Grall
  2 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2016-12-08 16:26 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Julien Grall

On 08/12/16 16:00, Jan Beulich wrote:
> This brings it in line with most other functions dealing with CPU
> masks. Convert both implementations to inline functions at once.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-08 16:01 ` [PATCH 2/3] VT-d: correct dma_msi_set_affinity() Jan Beulich
@ 2016-12-08 17:33   ` Andrew Cooper
  2016-12-09  8:47     ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2016-12-08 17:33 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Kevin Tian, Feng Wu

On 08/12/16 16:01, Jan Beulich wrote:
> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748

Which commit?

> ("amd iommu: use base platform MSI implementation") introducing the use
> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
> at least in theory, be called in interrupt context, and hence the use
> of that scratch variable is not correct.
>
> Since the function overwrites the destination information anyway,
> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
> use of that scratch variable.

Which function overwrites what?  I can't see dma_msi_set_affinity()
doing anything to clobber msg.dest32, so I don't understand why this
change is correct.

I can see why the current behaviour is unsafe, and agree that it should
change.

~Andrew

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -160,39 +160,37 @@ static bool_t msix_memory_decoded(const
>   */
>  void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg *msg)
>  {
> -    unsigned dest;
> -
>      memset(msg, 0, sizeof(*msg));
> -    if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
> +
> +    if ( vector < FIRST_DYNAMIC_VECTOR )
>          return;
>  
> -    if ( vector )
> +    if ( cpu_mask )
>      {
>          cpumask_t *mask = this_cpu(scratch_mask);
>  
> -        cpumask_and(mask, cpu_mask, &cpu_online_map);
> -        dest = cpu_mask_to_apicid(mask);
> +        if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
> +            return;
>  
> -        msg->address_hi = MSI_ADDR_BASE_HI;
> -        msg->address_lo =
> -            MSI_ADDR_BASE_LO |
> -            ((INT_DEST_MODE == 0) ?
> -             MSI_ADDR_DESTMODE_PHYS:
> -             MSI_ADDR_DESTMODE_LOGIC) |
> -            ((INT_DELIVERY_MODE != dest_LowestPrio) ?
> -             MSI_ADDR_REDIRECTION_CPU:
> -             MSI_ADDR_REDIRECTION_LOWPRI) |
> -            MSI_ADDR_DEST_ID(dest);
> -        msg->dest32 = dest;
> -
> -        msg->data =
> -            MSI_DATA_TRIGGER_EDGE |
> -            MSI_DATA_LEVEL_ASSERT |
> -            ((INT_DELIVERY_MODE != dest_LowestPrio) ?
> -             MSI_DATA_DELIVERY_FIXED:
> -             MSI_DATA_DELIVERY_LOWPRI) |
> -            MSI_DATA_VECTOR(vector);
> +        cpumask_and(mask, cpu_mask, &cpu_online_map);
> +        msg->dest32 = cpu_mask_to_apicid(mask);
>      }
> +
> +    msg->address_hi = MSI_ADDR_BASE_HI;
> +    msg->address_lo = MSI_ADDR_BASE_LO |
> +                      (INT_DEST_MODE ? MSI_ADDR_DESTMODE_LOGIC
> +                                     : MSI_ADDR_DESTMODE_PHYS) |
> +                      ((INT_DELIVERY_MODE != dest_LowestPrio)
> +                       ? MSI_ADDR_REDIRECTION_CPU
> +                       : MSI_ADDR_REDIRECTION_LOWPRI) |
> +                      MSI_ADDR_DEST_ID(msg->dest32);
> +
> +    msg->data = MSI_DATA_TRIGGER_EDGE |
> +                MSI_DATA_LEVEL_ASSERT |
> +                ((INT_DELIVERY_MODE != dest_LowestPrio)
> +                 ? MSI_DATA_DELIVERY_FIXED
> +                 : MSI_DATA_DELIVERY_LOWPRI) |
> +                MSI_DATA_VECTOR(vector);
>  }
>  
>  static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1085,11 +1085,10 @@ static void dma_msi_set_affinity(struct
>          return;
>      }
>  
> -    msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg);
> -    /* Are these overrides really needed? */
> +    msi_compose_msg(desc->arch.vector, NULL, &msg);
>      if (x2apic_enabled)
>          msg.address_hi = dest & 0xFFFFFF00;
> -    msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
> +    ASSERT(!(msg.address_lo & MSI_ADDR_DEST_ID_MASK));
>      msg.address_lo |= MSI_ADDR_DEST_ID(dest);
>      iommu->msi.msg = msg;
>  
>
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] x86: introduce and use scratch CPU mask
  2016-12-08 16:02 ` [PATCH 3/3] x86: introduce and use scratch CPU mask Jan Beulich
@ 2016-12-08 17:51   ` Andrew Cooper
  2016-12-09  8:59     ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2016-12-08 17:51 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 08/12/16 16:02, Jan Beulich wrote:
> __get_page_type(), so far using an on-stack CPU mask variable, is
> involved in the recursion when e.g. pinning page tables. This means

"in recursion".

> there may be up two five instances of the function active at a time,

"up to five".

> implying five instances of the (up to 512 bytes large) CPU mask
> variable. With an IRQ happening at the deepest point of the stack, and
> with send_guest_pirq() being called from there (leading to vcpu_kick()
> -> ... -> csched_vcpu_wake() -> __runq_tickle() ->
> cpumask_raise_softirq(), the last two of which also have CPU mask
> variables on their stacks), this has been observed to cause a stack

"stacks), has been".

> overflow with a 4095-pCPU build.
>
> Introduce a per-CPU variable instead, which can then be used by any
> code never running in IRQ context.
>
> The mask can then also be used by other MMU code as well as by
> msi_compose_msg() (and quite likely we'll find further uses down the
> road).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2477,6 +2477,7 @@ static int __get_page_type(struct page_i
>      int rc = 0, iommu_ret = 0;
>  
>      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> +    ASSERT(!in_irq());
>  
>      for ( ; ; )
>      {
> @@ -2509,20 +2510,21 @@ static int __get_page_type(struct page_i
>                   * may be unnecessary (e.g., page was GDT/LDT) but those 
>                   * circumstances should be very rare.
>                   */
> -                cpumask_t mask;
> +                cpumask_t *mask = this_cpu(scratch_cpumask);

This indirection looks suspicious.  Why do you have a per_cpu pointer to
a cpumask, with a dynamically allocated mask?

It would be smaller and more efficient overall to have a fully cpumask
allocated in the per-cpu area, and use it via

cpumask_t *mask = &this_cpu(scratch_cpumask);

except...

> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -56,6 +56,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t
>  /* representing HT and core siblings of each logical CPU */
>  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
>  
> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
> +

.. this does look correct, but..

>  cpumask_t cpu_online_map __read_mostly;
>  EXPORT_SYMBOL(cpu_online_map);
>  
> @@ -646,6 +648,7 @@ static void cpu_smpboot_free(unsigned in
>  
>      free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
>      free_cpumask_var(per_cpu(cpu_core_mask, cpu));
> +    free_cpumask_var(per_cpu(scratch_cpumask, cpu));
>  
>      if ( per_cpu(stubs.addr, cpu) )
>      {
> @@ -734,7 +737,8 @@ static int cpu_smpboot_alloc(unsigned in
>          goto oom;
>  
>      if ( zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
> -         zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) )
> +         zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
> +         alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) )
>          return 0;

This doesn't.  I'm confused.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-08 17:33   ` Andrew Cooper
@ 2016-12-09  8:47     ` Jan Beulich
  2016-12-13  5:23       ` Tian, Kevin
  2016-12-15  9:54       ` Ping: " Jan Beulich
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-09  8:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Kevin Tian, Feng Wu

>>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
> On 08/12/16 16:01, Jan Beulich wrote:
>> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
> 
> Which commit?

Oops - initially I had intended the title to include the hash: 83cd2038fe.
I've adjusted the text.

>> ("amd iommu: use base platform MSI implementation") introducing the use
>> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
>> at least in theory, be called in interrupt context, and hence the use
>> of that scratch variable is not correct.
>>
>> Since the function overwrites the destination information anyway,
>> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
>> use of that scratch variable.
> 
> Which function overwrites what?  I can't see dma_msi_set_affinity()
> doing anything to clobber msg.dest32, so I don't understand why this
> change is correct.

msg.dest32 simply isn't being used. msg is local to that function, so
all that matters is which fields the function consumes. Is uses only
address and data, and updates address to properly specify the
intended destination. To guard against stale data (in
iommu->msi.msg), it may be reasonable to nevertheless set dest32
before storing msg into that field.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] x86: introduce and use scratch CPU mask
  2016-12-08 17:51   ` Andrew Cooper
@ 2016-12-09  8:59     ` Jan Beulich
  2016-12-15  9:56       ` Ping: " Jan Beulich
  2016-12-15 14:15       ` Andrew Cooper
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-09  8:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 08.12.16 at 18:51, <andrew.cooper3@citrix.com> wrote:
> On 08/12/16 16:02, Jan Beulich wrote:
>> variable. With an IRQ happening at the deepest point of the stack, and
>> with send_guest_pirq() being called from there (leading to vcpu_kick()
>> -> ... -> csched_vcpu_wake() -> __runq_tickle() ->
>> cpumask_raise_softirq(), the last two of which also have CPU mask
>> variables on their stacks), this has been observed to cause a stack
> 
> "stacks), has been".

Hmm, that looks strange to me: Wouldn't the dropping of "this"
also requite the "With" at the start of the sentence to be dropped
(and isn't the sentence okay with both left in place)?

>> @@ -2509,20 +2510,21 @@ static int __get_page_type(struct page_i
>>                   * may be unnecessary (e.g., page was GDT/LDT) but those 
>>                   * circumstances should be very rare.
>>                   */
>> -                cpumask_t mask;
>> +                cpumask_t *mask = this_cpu(scratch_cpumask);
> 
> This indirection looks suspicious.  Why do you have a per_cpu pointer to
> a cpumask, with a dynamically allocated mask?
> 
> It would be smaller and more efficient overall to have a fully cpumask
> allocated in the per-cpu area, and use it via

Well, as you can see from the smpboot.c context of the
modifications done, that's how other masks are being dealt with
too. The reasoning is that it is quite wasteful to pre-allocate 512
bytes for a CPU mask when on the running system perhaps only
the low few bytes will be used.

Overall I'm getting the impression from your comments that you
simply didn't recognize the use of cpumask_t vs cpumask_var_t
in the various places.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer
  2016-12-08 16:00 ` [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer Jan Beulich
  2016-12-08 16:26   ` Andrew Cooper
@ 2016-12-09 10:39   ` Wei Liu
  2016-12-09 18:05   ` Julien Grall
  2 siblings, 0 replies; 22+ messages in thread
From: Wei Liu @ 2016-12-09 10:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Thu, Dec 08, 2016 at 09:00:23AM -0700, Jan Beulich wrote:
> This brings it in line with most other functions dealing with CPU
> masks. Convert both implementations to inline functions at once.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer
  2016-12-08 16:00 ` [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer Jan Beulich
  2016-12-08 16:26   ` Andrew Cooper
  2016-12-09 10:39   ` Wei Liu
@ 2016-12-09 18:05   ` Julien Grall
  2 siblings, 0 replies; 22+ messages in thread
From: Julien Grall @ 2016-12-09 18:05 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan

Hi Jan,

On 08/12/16 16:00, Jan Beulich wrote:
> This brings it in line with most other functions dealing with CPU
> masks. Convert both implementations to inline functions at once.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Julien Grall <julien.grall@arm.com>

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-09  8:47     ` Jan Beulich
@ 2016-12-13  5:23       ` Tian, Kevin
  2016-12-13  7:30         ` Jan Beulich
  2016-12-15  9:54       ` Ping: " Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: Tian, Kevin @ 2016-12-13  5:23 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel, Wu, Feng

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, December 09, 2016 4:47 PM
> 
> >>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
> > On 08/12/16 16:01, Jan Beulich wrote:
> >> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
> >
> > Which commit?
> 
> Oops - initially I had intended the title to include the hash: 83cd2038fe.
> I've adjusted the text.
> 
> >> ("amd iommu: use base platform MSI implementation") introducing the use
> >> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
> >> at least in theory, be called in interrupt context, and hence the use
> >> of that scratch variable is not correct.
> >>
> >> Since the function overwrites the destination information anyway,
> >> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
> >> use of that scratch variable.
> >
> > Which function overwrites what?  I can't see dma_msi_set_affinity()
> > doing anything to clobber msg.dest32, so I don't understand why this
> > change is correct.
> 
> msg.dest32 simply isn't being used. msg is local to that function, so
> all that matters is which fields the function consumes. Is uses only
> address and data, and updates address to properly specify the
> intended destination. To guard against stale data (in
> iommu->msi.msg), it may be reasonable to nevertheless set dest32
> before storing msg into that field.
> 

So do you plan to send v2 or stick with current version?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-13  5:23       ` Tian, Kevin
@ 2016-12-13  7:30         ` Jan Beulich
  2016-12-14  4:58           ` Tian, Kevin
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2016-12-13  7:30 UTC (permalink / raw)
  To: Andrew Cooper, Kevin Tian; +Cc: xen-devel, Feng Wu

>>> On 13.12.16 at 06:23, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, December 09, 2016 4:47 PM
>> 
>> >>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
>> > On 08/12/16 16:01, Jan Beulich wrote:
>> >> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
>> >
>> > Which commit?
>> 
>> Oops - initially I had intended the title to include the hash: 83cd2038fe.
>> I've adjusted the text.
>> 
>> >> ("amd iommu: use base platform MSI implementation") introducing the use
>> >> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
>> >> at least in theory, be called in interrupt context, and hence the use
>> >> of that scratch variable is not correct.
>> >>
>> >> Since the function overwrites the destination information anyway,
>> >> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
>> >> use of that scratch variable.
>> >
>> > Which function overwrites what?  I can't see dma_msi_set_affinity()
>> > doing anything to clobber msg.dest32, so I don't understand why this
>> > change is correct.
>> 
>> msg.dest32 simply isn't being used. msg is local to that function, so
>> all that matters is which fields the function consumes. Is uses only
>> address and data, and updates address to properly specify the
>> intended destination. To guard against stale data (in
>> iommu->msi.msg), it may be reasonable to nevertheless set dest32
>> before storing msg into that field.
> 
> So do you plan to send v2 or stick with current version?

Well, so far I haven't heard back from Andrew, and hence didn't
plan on sending a v2 yet. If that addition is going to be the only
adjustment, I'm also not sure sending a v2 is actually necessary.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-13  7:30         ` Jan Beulich
@ 2016-12-14  4:58           ` Tian, Kevin
  0 siblings, 0 replies; 22+ messages in thread
From: Tian, Kevin @ 2016-12-14  4:58 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel, Wu, Feng

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, December 13, 2016 3:30 PM
> 
> >>> On 13.12.16 at 06:23, <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, December 09, 2016 4:47 PM
> >>
> >> >>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
> >> > On 08/12/16 16:01, Jan Beulich wrote:
> >> >> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
> >> >
> >> > Which commit?
> >>
> >> Oops - initially I had intended the title to include the hash: 83cd2038fe.
> >> I've adjusted the text.
> >>
> >> >> ("amd iommu: use base platform MSI implementation") introducing the use
> >> >> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
> >> >> at least in theory, be called in interrupt context, and hence the use
> >> >> of that scratch variable is not correct.
> >> >>
> >> >> Since the function overwrites the destination information anyway,
> >> >> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
> >> >> use of that scratch variable.
> >> >
> >> > Which function overwrites what?  I can't see dma_msi_set_affinity()
> >> > doing anything to clobber msg.dest32, so I don't understand why this
> >> > change is correct.
> >>
> >> msg.dest32 simply isn't being used. msg is local to that function, so
> >> all that matters is which fields the function consumes. Is uses only
> >> address and data, and updates address to properly specify the
> >> intended destination. To guard against stale data (in
> >> iommu->msi.msg), it may be reasonable to nevertheless set dest32
> >> before storing msg into that field.
> >
> > So do you plan to send v2 or stick with current version?
> 
> Well, so far I haven't heard back from Andrew, and hence didn't
> plan on sending a v2 yet. If that addition is going to be the only
> adjustment, I'm also not sure sending a v2 is actually necessary.
> 

then reviewed-by: Kevin Tian <kevin.tian@intel.com>

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Ping: Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-09  8:47     ` Jan Beulich
  2016-12-13  5:23       ` Tian, Kevin
@ 2016-12-15  9:54       ` Jan Beulich
  2016-12-15 12:52         ` Andrew Cooper
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2016-12-15  9:54 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Kevin Tian, Feng Wu

>>> On 09.12.16 at 09:47, <JBeulich@suse.com> wrote:
>>>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
>> On 08/12/16 16:01, Jan Beulich wrote:
>>> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
>> 
>> Which commit?
> 
> Oops - initially I had intended the title to include the hash: 83cd2038fe.
> I've adjusted the text.
> 
>>> ("amd iommu: use base platform MSI implementation") introducing the use
>>> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
>>> at least in theory, be called in interrupt context, and hence the use
>>> of that scratch variable is not correct.
>>>
>>> Since the function overwrites the destination information anyway,
>>> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
>>> use of that scratch variable.
>> 
>> Which function overwrites what?  I can't see dma_msi_set_affinity()
>> doing anything to clobber msg.dest32, so I don't understand why this
>> change is correct.
> 
> msg.dest32 simply isn't being used. msg is local to that function, so
> all that matters is which fields the function consumes. Is uses only
> address and data, and updates address to properly specify the
> intended destination. To guard against stale data (in
> iommu->msi.msg), it may be reasonable to nevertheless set dest32
> before storing msg into that field.

Any further thoughts here? Do I need to resend with that one line
added?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Ping: Re: [PATCH 3/3] x86: introduce and use scratch CPU mask
  2016-12-09  8:59     ` Jan Beulich
@ 2016-12-15  9:56       ` Jan Beulich
  2016-12-15 14:15       ` Andrew Cooper
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-15  9:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 09.12.16 at 09:59, <JBeulich@suse.com> wrote:
>>>> On 08.12.16 at 18:51, <andrew.cooper3@citrix.com> wrote:
>> On 08/12/16 16:02, Jan Beulich wrote:
>>> variable. With an IRQ happening at the deepest point of the stack, and
>>> with send_guest_pirq() being called from there (leading to vcpu_kick()
>>> -> ... -> csched_vcpu_wake() -> __runq_tickle() ->
>>> cpumask_raise_softirq(), the last two of which also have CPU mask
>>> variables on their stacks), this has been observed to cause a stack
>> 
>> "stacks), has been".
> 
> Hmm, that looks strange to me: Wouldn't the dropping of "this"
> also requite the "With" at the start of the sentence to be dropped
> (and isn't the sentence okay with both left in place)?
> 
>>> @@ -2509,20 +2510,21 @@ static int __get_page_type(struct page_i
>>>                   * may be unnecessary (e.g., page was GDT/LDT) but those 
>>>                   * circumstances should be very rare.
>>>                   */
>>> -                cpumask_t mask;
>>> +                cpumask_t *mask = this_cpu(scratch_cpumask);
>> 
>> This indirection looks suspicious.  Why do you have a per_cpu pointer to
>> a cpumask, with a dynamically allocated mask?
>> 
>> It would be smaller and more efficient overall to have a fully cpumask
>> allocated in the per-cpu area, and use it via
> 
> Well, as you can see from the smpboot.c context of the
> modifications done, that's how other masks are being dealt with
> too. The reasoning is that it is quite wasteful to pre-allocate 512
> bytes for a CPU mask when on the running system perhaps only
> the low few bytes will be used.
> 
> Overall I'm getting the impression from your comments that you
> simply didn't recognize the use of cpumask_t vs cpumask_var_t
> in the various places.

This one did stall too.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Ping: Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-15  9:54       ` Ping: " Jan Beulich
@ 2016-12-15 12:52         ` Andrew Cooper
  2016-12-15 14:16           ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2016-12-15 12:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Kevin Tian, Feng Wu

On 15/12/16 09:54, Jan Beulich wrote:
>>>> On 09.12.16 at 09:47, <JBeulich@suse.com> wrote:
>>>>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
>>> On 08/12/16 16:01, Jan Beulich wrote:
>>>> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
>>> Which commit?
>> Oops - initially I had intended the title to include the hash: 83cd2038fe.
>> I've adjusted the text.
>>
>>>> ("amd iommu: use base platform MSI implementation") introducing the use
>>>> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
>>>> at least in theory, be called in interrupt context, and hence the use
>>>> of that scratch variable is not correct.
>>>>
>>>> Since the function overwrites the destination information anyway,
>>>> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
>>>> use of that scratch variable.
>>> Which function overwrites what?  I can't see dma_msi_set_affinity()
>>> doing anything to clobber msg.dest32, so I don't understand why this
>>> change is correct.
>> msg.dest32 simply isn't being used. msg is local to that function, so
>> all that matters is which fields the function consumes. Is uses only
>> address and data, and updates address to properly specify the
>> intended destination. To guard against stale data (in
>> iommu->msi.msg), it may be reasonable to nevertheless set dest32
>> before storing msg into that field.
> Any further thoughts here? Do I need to resend with that one line
> added?

msg is tiny.  I'd suggest just initialising to 0 at the start of the
function.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] x86: introduce and use scratch CPU mask
  2016-12-09  8:59     ` Jan Beulich
  2016-12-15  9:56       ` Ping: " Jan Beulich
@ 2016-12-15 14:15       ` Andrew Cooper
  2016-12-15 14:59         ` Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2016-12-15 14:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 09/12/16 08:59, Jan Beulich wrote:
>>>> On 08.12.16 at 18:51, <andrew.cooper3@citrix.com> wrote:
>> On 08/12/16 16:02, Jan Beulich wrote:
>>> variable. With an IRQ happening at the deepest point of the stack, and
>>> with send_guest_pirq() being called from there (leading to vcpu_kick()
>>> -> ... -> csched_vcpu_wake() -> __runq_tickle() ->
>>> cpumask_raise_softirq(), the last two of which also have CPU mask
>>> variables on their stacks), this has been observed to cause a stack
>> "stacks), has been".
> Hmm, that looks strange to me: Wouldn't the dropping of "this"
> also requite the "With" at the start of the sentence to be dropped
> (and isn't the sentence okay with both left in place)?

The use of "this" requires a substantial backtrack through the text to
evaluate what it refers to, which as you point, I didn't manage to do
successfully. 

As is evident, I had a very hard time trying to parse the sentence.

It would be clearer to read if you took out both the "With" and "this".

>
>>> @@ -2509,20 +2510,21 @@ static int __get_page_type(struct page_i
>>>                   * may be unnecessary (e.g., page was GDT/LDT) but those 
>>>                   * circumstances should be very rare.
>>>                   */
>>> -                cpumask_t mask;
>>> +                cpumask_t *mask = this_cpu(scratch_cpumask);
>> This indirection looks suspicious.  Why do you have a per_cpu pointer to
>> a cpumask, with a dynamically allocated mask?
>>
>> It would be smaller and more efficient overall to have a fully cpumask
>> allocated in the per-cpu area, and use it via
> Well, as you can see from the smpboot.c context of the
> modifications done, that's how other masks are being dealt with
> too. The reasoning is that it is quite wasteful to pre-allocate 512
> bytes for a CPU mask when on the running system perhaps only
> the low few bytes will be used.
>
> Overall I'm getting the impression from your comments that you
> simply didn't recognize the use of cpumask_t vs cpumask_var_t
> in the various places.

Ok - on the basis that this is the same as the prevailing uses,

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Ping: Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-15 12:52         ` Andrew Cooper
@ 2016-12-15 14:16           ` Jan Beulich
  2016-12-15 15:39             ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2016-12-15 14:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Kevin Tian, Feng Wu

>>> On 15.12.16 at 13:52, <andrew.cooper3@citrix.com> wrote:
> On 15/12/16 09:54, Jan Beulich wrote:
>>>>> On 09.12.16 at 09:47, <JBeulich@suse.com> wrote:
>>>>>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
>>>> On 08/12/16 16:01, Jan Beulich wrote:
>>>>> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
>>>> Which commit?
>>> Oops - initially I had intended the title to include the hash: 83cd2038fe.
>>> I've adjusted the text.
>>>
>>>>> ("amd iommu: use base platform MSI implementation") introducing the use
>>>>> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
>>>>> at least in theory, be called in interrupt context, and hence the use
>>>>> of that scratch variable is not correct.
>>>>>
>>>>> Since the function overwrites the destination information anyway,
>>>>> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
>>>>> use of that scratch variable.
>>>> Which function overwrites what?  I can't see dma_msi_set_affinity()
>>>> doing anything to clobber msg.dest32, so I don't understand why this
>>>> change is correct.
>>> msg.dest32 simply isn't being used. msg is local to that function, so
>>> all that matters is which fields the function consumes. Is uses only
>>> address and data, and updates address to properly specify the
>>> intended destination. To guard against stale data (in
>>> iommu->msi.msg), it may be reasonable to nevertheless set dest32
>>> before storing msg into that field.
>> Any further thoughts here? Do I need to resend with that one line
>> added?
> 
> msg is tiny.  I'd suggest just initialising to 0 at the start of the
> function.

Why? msi_compose_msg() does exactly that as its very first action.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] x86: introduce and use scratch CPU mask
  2016-12-15 14:15       ` Andrew Cooper
@ 2016-12-15 14:59         ` Jan Beulich
  2016-12-15 15:38           ` Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2016-12-15 14:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 15.12.16 at 15:15, <andrew.cooper3@citrix.com> wrote:
> On 09/12/16 08:59, Jan Beulich wrote:
>>>>> On 08.12.16 at 18:51, <andrew.cooper3@citrix.com> wrote:
>>> On 08/12/16 16:02, Jan Beulich wrote:
>>>> variable. With an IRQ happening at the deepest point of the stack, and
>>>> with send_guest_pirq() being called from there (leading to vcpu_kick()
>>>> -> ... -> csched_vcpu_wake() -> __runq_tickle() ->
>>>> cpumask_raise_softirq(), the last two of which also have CPU mask
>>>> variables on their stacks), this has been observed to cause a stack
>>> "stacks), has been".
>> Hmm, that looks strange to me: Wouldn't the dropping of "this"
>> also requite the "With" at the start of the sentence to be dropped
>> (and isn't the sentence okay with both left in place)?
> 
> The use of "this" requires a substantial backtrack through the text to
> evaluate what it refers to, which as you point, I didn't manage to do
> successfully. 
> 
> As is evident, I had a very hard time trying to parse the sentence.
> 
> It would be clearer to read if you took out both the "With" and "this".

I've re-ordered it some more:

__get_page_type(), so far using an on-stack CPU mask variable, is
involved in recursion when e.g. pinning page tables. This means there
may be up to five instances of the function active at a time, implying
five instances of the (up to 512 bytes large) CPU mask variable. An IRQ
happening at the deepest point of the stack has been observed to cause
a stack overflow with a 4095-pCPU build, when the IRQ handling results
in send_guest_pirq() being called (leading to vcpu_kick() -> ... ->
csched_vcpu_wake() -> __runq_tickle() -> cpumask_raise_softirq(), the
last two of which also have CPU mask variables on their stacks).

Does this suit you better?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] x86: introduce and use scratch CPU mask
  2016-12-15 14:59         ` Jan Beulich
@ 2016-12-15 15:38           ` Andrew Cooper
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2016-12-15 15:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 15/12/16 14:59, Jan Beulich wrote:
>>>> On 15.12.16 at 15:15, <andrew.cooper3@citrix.com> wrote:
>> On 09/12/16 08:59, Jan Beulich wrote:
>>>>>> On 08.12.16 at 18:51, <andrew.cooper3@citrix.com> wrote:
>>>> On 08/12/16 16:02, Jan Beulich wrote:
>>>>> variable. With an IRQ happening at the deepest point of the stack, and
>>>>> with send_guest_pirq() being called from there (leading to vcpu_kick()
>>>>> -> ... -> csched_vcpu_wake() -> __runq_tickle() ->
>>>>> cpumask_raise_softirq(), the last two of which also have CPU mask
>>>>> variables on their stacks), this has been observed to cause a stack
>>>> "stacks), has been".
>>> Hmm, that looks strange to me: Wouldn't the dropping of "this"
>>> also requite the "With" at the start of the sentence to be dropped
>>> (and isn't the sentence okay with both left in place)?
>> The use of "this" requires a substantial backtrack through the text to
>> evaluate what it refers to, which as you point, I didn't manage to do
>> successfully. 
>>
>> As is evident, I had a very hard time trying to parse the sentence.
>>
>> It would be clearer to read if you took out both the "With" and "this".
> I've re-ordered it some more:
>
> __get_page_type(), so far using an on-stack CPU mask variable, is
> involved in recursion when e.g. pinning page tables. This means there
> may be up to five instances of the function active at a time, implying
> five instances of the (up to 512 bytes large) CPU mask variable. An IRQ
> happening at the deepest point of the stack has been observed to cause
> a stack overflow with a 4095-pCPU build, when the IRQ handling results
> in send_guest_pirq() being called (leading to vcpu_kick() -> ... ->
> csched_vcpu_wake() -> __runq_tickle() -> cpumask_raise_softirq(), the
> last two of which also have CPU mask variables on their stacks).
>
> Does this suit you better?

Much clearer, thanks.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Ping: Re: [PATCH 2/3] VT-d: correct dma_msi_set_affinity()
  2016-12-15 14:16           ` Jan Beulich
@ 2016-12-15 15:39             ` Andrew Cooper
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2016-12-15 15:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Kevin Tian, Feng Wu

On 15/12/16 14:16, Jan Beulich wrote:
>>>> On 15.12.16 at 13:52, <andrew.cooper3@citrix.com> wrote:
>> On 15/12/16 09:54, Jan Beulich wrote:
>>>>>> On 09.12.16 at 09:47, <JBeulich@suse.com> wrote:
>>>>>>> On 08.12.16 at 18:33, <andrew.cooper3@citrix.com> wrote:
>>>>> On 08/12/16 16:01, Jan Beulich wrote:
>>>>>> That commit ("VT-d: use msi_compose_msg()) together with 15aa6c6748
>>>>> Which commit?
>>>> Oops - initially I had intended the title to include the hash: 83cd2038fe.
>>>> I've adjusted the text.
>>>>
>>>>>> ("amd iommu: use base platform MSI implementation") introducing the use
>>>>>> of a per-CPU scratch CPU mask went too far: dma_msi_set_affinity() may,
>>>>>> at least in theory, be called in interrupt context, and hence the use
>>>>>> of that scratch variable is not correct.
>>>>>>
>>>>>> Since the function overwrites the destination information anyway,
>>>>>> allow msi_compose_msg() to be called with a NULL CPU mask, avoiding the
>>>>>> use of that scratch variable.
>>>>> Which function overwrites what?  I can't see dma_msi_set_affinity()
>>>>> doing anything to clobber msg.dest32, so I don't understand why this
>>>>> change is correct.
>>>> msg.dest32 simply isn't being used. msg is local to that function, so
>>>> all that matters is which fields the function consumes. Is uses only
>>>> address and data, and updates address to properly specify the
>>>> intended destination. To guard against stale data (in
>>>> iommu->msi.msg), it may be reasonable to nevertheless set dest32
>>>> before storing msg into that field.
>>> Any further thoughts here? Do I need to resend with that one line
>>> added?
>> msg is tiny.  I'd suggest just initialising to 0 at the start of the
>> function.
> Why? msi_compose_msg() does exactly that as its very first action.

Perhaps it would be easier to post a v2 of just this patch.  I can't
work out which one line you are referring to.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-12-15 15:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-08 15:54 [PATCH 0/3] misc cpumask adjustments Jan Beulich
2016-12-08 16:00 ` [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer Jan Beulich
2016-12-08 16:26   ` Andrew Cooper
2016-12-09 10:39   ` Wei Liu
2016-12-09 18:05   ` Julien Grall
2016-12-08 16:01 ` [PATCH 2/3] VT-d: correct dma_msi_set_affinity() Jan Beulich
2016-12-08 17:33   ` Andrew Cooper
2016-12-09  8:47     ` Jan Beulich
2016-12-13  5:23       ` Tian, Kevin
2016-12-13  7:30         ` Jan Beulich
2016-12-14  4:58           ` Tian, Kevin
2016-12-15  9:54       ` Ping: " Jan Beulich
2016-12-15 12:52         ` Andrew Cooper
2016-12-15 14:16           ` Jan Beulich
2016-12-15 15:39             ` Andrew Cooper
2016-12-08 16:02 ` [PATCH 3/3] x86: introduce and use scratch CPU mask Jan Beulich
2016-12-08 17:51   ` Andrew Cooper
2016-12-09  8:59     ` Jan Beulich
2016-12-15  9:56       ` Ping: " Jan Beulich
2016-12-15 14:15       ` Andrew Cooper
2016-12-15 14:59         ` Jan Beulich
2016-12-15 15:38           ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).