xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] Implement 3-level event channel in Xen
@ 2013-01-31 14:42 Wei Liu
  2013-01-31 14:42 ` [PATCH 01/16] Remove trailing whitespaces in sched.h Wei Liu
                   ` (15 more replies)
  0 siblings, 16 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.campbell, jbeulich, david.vrabel

Changes from RFC V2:
* Adjust registration interface
* Get rid of xmalloc and friends in registraion routine
* Avoid redirection with function pointers
* Share routines between 2 and 3 level event channels

Changes from RFC V1;
* Use function pointers to get rid of switch statments
* Do not manipulate VCPU state
* No more gcc-ism code in public headers
* Consolidate some boilerplates using macros

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/16] Remove trailing whitespaces in sched.h
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 02/16] Remove trailing whitespaces in event.h Wei Liu
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 90a6537..39f85d2 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -92,7 +92,7 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;
 
-struct vcpu 
+struct vcpu
 {
     int              vcpu_id;
 
@@ -453,7 +453,7 @@ struct domain *domain_create(
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
  * This is the preferred function if the returned domain reference
- * is short lived,  but it cannot be used if the domain reference needs 
+ * is short lived,  but it cannot be used if the domain reference needs
  * to be kept beyond the current scope (e.g., across a softirq).
  * The returned domain reference must be discarded using rcu_unlock_domain().
  */
@@ -574,7 +574,7 @@ void sync_local_execstate(void);
  * sync_vcpu_execstate() will switch and commit @prev's state.
  */
 void context_switch(
-    struct vcpu *prev, 
+    struct vcpu *prev,
     struct vcpu *next);
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/16] Remove trailing whitespaces in event.h
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
  2013-01-31 14:42 ` [PATCH 01/16] Remove trailing whitespaces in sched.h Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 03/16] Remove trailing whitespaces in xen.h Wei Liu
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/event.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 71c3e92..65ac81a 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -1,8 +1,8 @@
 /******************************************************************************
  * event.h
- * 
+ *
  * A nice interface for passing asynchronous events to guest OSes.
- * 
+ *
  * Copyright (c) 2002-2006, K A Fraser
  */
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/16] Remove trailing whitespaces in xen.h
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
  2013-01-31 14:42 ` [PATCH 01/16] Remove trailing whitespaces in sched.h Wei Liu
  2013-01-31 14:42 ` [PATCH 02/16] Remove trailing whitespaces in event.h Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 04/16] Move event channel macros / struct definition to proper place Wei Liu
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/xen.h |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 5593066..fe44eb5 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -1,8 +1,8 @@
 /******************************************************************************
  * xen.h
- * 
+ *
  * Guest OS interface to Xen.
- * 
+ *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to
  * deal in the Software without restriction, including without limitation the
@@ -137,11 +137,11 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_dom0_op __HYPERVISOR_platform_op
 #endif
 
-/* 
+/*
  * VIRTUAL INTERRUPTS
- * 
+ *
  * Virtual interrupts that a guest OS may receive from Xen.
- * 
+ *
  * In the side comments, 'V.' denotes a per-VCPU VIRQ while 'G.' denotes a
  * global VIRQ. The former can be bound once per VCPU and cannot be re-bound.
  * The latter can be allocated only once per guest: they must initially be
@@ -190,7 +190,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
  *                     (x) encodes the PFD as follows:
  *                     x == 0 => PFD == DOMID_SELF
  *                     x != 0 => PFD == x - 1
- * 
+ *
  * Sub-commands: ptr[1:0] specifies the appropriate MMU_* command.
  * -------------
  * ptr[1:0] == MMU_NORMAL_PT_UPDATE:
@@ -236,13 +236,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
  * To deallocate the pages, the operations are the reverse of the steps
  * mentioned above. The argument is MMUEXT_UNPIN_TABLE for all levels and the
  * pagetable MUST not be in use (meaning that the cr3 is not set to it).
- * 
+ *
  * ptr[1:0] == MMU_MACHPHYS_UPDATE:
  * Updates an entry in the machine->pseudo-physical mapping table.
  * ptr[:2]  -- Machine address within the frame whose mapping to modify.
  *             The frame must belong to the FD, if one is specified.
  * val      -- Value to write into the mapping entry.
- * 
+ *
  * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD:
  * As MMU_NORMAL_PT_UPDATE above, but A/D bits currently in the PTE are ORed
  * with those in @val.
@@ -588,7 +588,7 @@ typedef struct vcpu_time_info vcpu_time_info_t;
 struct vcpu_info {
     /*
      * 'evtchn_upcall_pending' is written non-zero by Xen to indicate
-     * a pending notification for a particular VCPU. It is then cleared 
+     * a pending notification for a particular VCPU. It is then cleared
      * by the guest OS /before/ checking for pending work, thus avoiding
      * a set-and-check race. Note that the mask is only accessed by Xen
      * on the CPU that is currently hosting the VCPU. This means that the
@@ -646,7 +646,7 @@ struct shared_info {
      *  3. Virtual interrupts ('events'). A domain can bind an event-channel
      *     port to a virtual interrupt source, such as the virtual-timer
      *     device or the emergency console.
-     * 
+     *
      * Event channels are addressed by a "port index". Each channel is
      * associated with two bits of information:
      *  1. PENDING -- notifies the domain that there is a pending notification
@@ -657,7 +657,7 @@ struct shared_info {
      *     becomes pending while the channel is masked then the 'edge' is lost
      *     (i.e., when the channel is unmasked, the guest must manually handle
      *     pending notifications as no upcall will be scheduled by Xen).
-     * 
+     *
      * To expedite scanning of pending notifications, any 0->1 pending
      * transition on an unmasked channel causes a corresponding bit in a
      * per-vcpu selector word to be set. Each bit in the selector covers a
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/16] Move event channel macros / struct definition to proper place
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (2 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 03/16] Remove trailing whitespaces in xen.h Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-02-04  9:00   ` Jan Beulich
  2013-01-31 14:42 ` [PATCH 05/16] Add evtchn_level in struct domain Wei Liu
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

The original ones seem to be mis-placed in sched.h, move them to proper place
in xen.h and event.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/xen.h |    2 ++
 xen/include/xen/event.h  |   43 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched.h  |   45 ---------------------------------------------
 3 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index fe44eb5..6132682 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -557,6 +557,8 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
  *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
  */
 #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)
+#define EVTCHNS_PER_BUCKET 128
+#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
 
 struct vcpu_time_info {
     /*
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 65ac81a..1c13bd0 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -15,6 +15,49 @@
 #include <asm/bitops.h>
 #include <asm/event.h>
 
+#ifndef CONFIG_COMPAT
+#define BITS_PER_EVTCHN_WORD(d) BITS_PER_LONG
+#else
+#define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_LONG)
+#endif
+#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
+
+struct evtchn
+{
+#define ECS_FREE         0 /* Channel is available for use.                  */
+#define ECS_RESERVED     1 /* Channel is reserved.                           */
+#define ECS_UNBOUND      2 /* Channel is waiting to bind to a remote domain. */
+#define ECS_INTERDOMAIN  3 /* Channel is bound to another domain.            */
+#define ECS_PIRQ         4 /* Channel is bound to a physical IRQ line.       */
+#define ECS_VIRQ         5 /* Channel is bound to a virtual IRQ line.        */
+#define ECS_IPI          6 /* Channel is bound to a virtual IPI line.        */
+    u8  state;             /* ECS_* */
+    u8  xen_consumer;      /* Consumer in Xen, if any? (0 = send to guest) */
+    u16 notify_vcpu_id;    /* VCPU for local delivery notification */
+    union {
+        struct {
+            domid_t remote_domid;
+        } unbound;     /* state == ECS_UNBOUND */
+        struct {
+            u16            remote_port;
+            struct domain *remote_dom;
+        } interdomain; /* state == ECS_INTERDOMAIN */
+        struct {
+            u16            irq;
+            u16            next_port;
+            u16            prev_port;
+        } pirq;        /* state == ECS_PIRQ */
+        u16 virq;      /* state == ECS_VIRQ */
+    } u;
+#ifdef FLASK_ENABLE
+    void *ssid;
+#endif
+};
+
+int  evtchn_init(struct domain *d); /* from domain_create */
+void evtchn_destroy(struct domain *d); /* from domain_kill */
+void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
+
 /*
  * send_guest_vcpu_virq: Notify guest via a per-VCPU VIRQ.
  *  @v:        VCPU to which virtual IRQ should be sent
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 39f85d2..64a0ba4 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -45,51 +45,6 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
 /* A global pointer to the initial domain (DOM0). */
 extern struct domain *dom0;
 
-#ifndef CONFIG_COMPAT
-#define BITS_PER_EVTCHN_WORD(d) BITS_PER_LONG
-#else
-#define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_LONG)
-#endif
-#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
-#define EVTCHNS_PER_BUCKET 128
-#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
-
-struct evtchn
-{
-#define ECS_FREE         0 /* Channel is available for use.                  */
-#define ECS_RESERVED     1 /* Channel is reserved.                           */
-#define ECS_UNBOUND      2 /* Channel is waiting to bind to a remote domain. */
-#define ECS_INTERDOMAIN  3 /* Channel is bound to another domain.            */
-#define ECS_PIRQ         4 /* Channel is bound to a physical IRQ line.       */
-#define ECS_VIRQ         5 /* Channel is bound to a virtual IRQ line.        */
-#define ECS_IPI          6 /* Channel is bound to a virtual IPI line.        */
-    u8  state;             /* ECS_* */
-    u8  xen_consumer;      /* Consumer in Xen, if any? (0 = send to guest) */
-    u16 notify_vcpu_id;    /* VCPU for local delivery notification */
-    union {
-        struct {
-            domid_t remote_domid;
-        } unbound;     /* state == ECS_UNBOUND */
-        struct {
-            u16            remote_port;
-            struct domain *remote_dom;
-        } interdomain; /* state == ECS_INTERDOMAIN */
-        struct {
-            u16            irq;
-            u16            next_port;
-            u16            prev_port;
-        } pirq;        /* state == ECS_PIRQ */
-        u16 virq;      /* state == ECS_VIRQ */
-    } u;
-#ifdef FLASK_ENABLE
-    void *ssid;
-#endif
-};
-
-int  evtchn_init(struct domain *d); /* from domain_create */
-void evtchn_destroy(struct domain *d); /* from domain_kill */
-void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
-
 struct waitqueue_vcpu;
 
 struct vcpu
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/16] Add evtchn_level in struct domain
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (3 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 04/16] Move event channel macros / struct definition to proper place Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 06/16] Dynamically allocate d->evtchn Wei Liu
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

This field is manipulated by hypervisor only, so if anything goes wrong it is
a bug.

The default event channel is 2, which has two level lookup structure: a
selector in struct vcpu and a shared bitmap in shared info.

The up coming 3-level event channel utilizes three level lookup structure: a
top level selector and second level selector for every vcpu, and shared
bitmap.

When constructing a domain, it starts with 2-level event channel, which is
guaranteed to be supported by the hypervisor. If a domain wants to use N
(N>=3) level event channel, it must explicitly issue a hypercall to setup
N-level event channel.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |    1 +
 xen/include/xen/event.h    |   16 +++++++++++++++-
 xen/include/xen/sched.h    |    1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 9231eb0..b96d5b1 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1173,6 +1173,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
 int evtchn_init(struct domain *d)
 {
     spin_lock_init(&d->event_lock);
+    d->evtchn_level = EVTCHN_DEFAULT_LEVEL;
     if ( get_free_port(d) != 0 )
         return -EINVAL;
     evtchn_from_port(d, 0)->state = ECS_RESERVED;
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 1c13bd0..c17b891 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -20,7 +20,21 @@
 #else
 #define BITS_PER_EVTCHN_WORD(d) (has_32bit_shinfo(d) ? 32 : BITS_PER_LONG)
 #endif
-#define MAX_EVTCHNS(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
+#define EVTCHN_2_LEVEL       2
+#define EVTCHN_3_LEVEL       3
+#define EVTCHN_DEFAULT_LEVEL EVTCHN_2_LEVEL
+#define MAX_EVTCHNS_L2(d) (BITS_PER_EVTCHN_WORD(d) * BITS_PER_EVTCHN_WORD(d))
+#define MAX_EVTCHNS_L3(d) (MAX_EVTCHNS_L2(d) * BITS_PER_EVTCHN_WORD(d))
+#define MAX_EVTCHNS(d) ({ int __v = 0;				\
+			switch ( d->evtchn_level ) {		\
+			case EVTCHN_2_LEVEL:			\
+				__v = MAX_EVTCHNS_L2(d); break; \
+			case EVTCHN_3_LEVEL:			\
+				__v = MAX_EVTCHNS_L3(d); break; \
+			default:				\
+				BUG();                          \
+			};					\
+			__v;})
 
 struct evtchn
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 64a0ba4..21f7b68 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -217,6 +217,7 @@ struct domain
     /* Event channel information. */
     struct evtchn   *evtchn[NR_EVTCHN_BUCKETS];
     spinlock_t       event_lock;
+    unsigned int     evtchn_level;
 
     struct grant_table *grant_table;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/16] Dynamically allocate d->evtchn
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (4 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 05/16] Add evtchn_level in struct domain Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 07/16] Bump EVTCHNS_PER_BUCKET to 512 Wei Liu
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

As we move to N level evtchn we need bigger d->evtchn, as a result
this will bloat struct domain. So move this array out of struct domain
and allocate a dedicated page for it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |   17 +++++++++++++++--
 xen/include/xen/sched.h    |    2 +-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index b96d5b1..43ee854 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1172,16 +1172,27 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
 
 int evtchn_init(struct domain *d)
 {
+    BUILD_BUG_ON(sizeof(struct evtchn *) * NR_EVTCHN_BUCKETS > PAGE_SIZE);
+    d->evtchn = alloc_xenheap_page();
+
+    if ( d->evtchn == NULL )
+        return -ENOMEM;
+    clear_page(d->evtchn);
+
     spin_lock_init(&d->event_lock);
     d->evtchn_level = EVTCHN_DEFAULT_LEVEL;
-    if ( get_free_port(d) != 0 )
+    if ( get_free_port(d) != 0 ) {
+        free_xenheap_page(d->evtchn);
         return -EINVAL;
+    }
     evtchn_from_port(d, 0)->state = ECS_RESERVED;
 
 #if MAX_VIRT_CPUS > BITS_PER_LONG
     d->poll_mask = xmalloc_array(unsigned long, BITS_TO_LONGS(MAX_VIRT_CPUS));
-    if ( !d->poll_mask )
+    if ( !d->poll_mask ) {
+        free_xenheap_page(d->evtchn);
         return -ENOMEM;
+    }
     bitmap_zero(d->poll_mask, MAX_VIRT_CPUS);
 #endif
 
@@ -1215,6 +1226,8 @@ void evtchn_destroy(struct domain *d)
     spin_unlock(&d->event_lock);
 
     clear_global_virq_handlers(d);
+
+    free_xenheap_page(d->evtchn);
 }
 
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 21f7b68..2f18fe5 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -215,7 +215,7 @@ struct domain
     spinlock_t       rangesets_lock;
 
     /* Event channel information. */
-    struct evtchn   *evtchn[NR_EVTCHN_BUCKETS];
+    struct evtchn  **evtchn;
     spinlock_t       event_lock;
     unsigned int     evtchn_level;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/16] Bump EVTCHNS_PER_BUCKET to 512
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (5 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 06/16] Dynamically allocate d->evtchn Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 08/16] Add evtchn_is_{pending, masked} and evtchn_clear_pending Wei Liu
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

For 64 bit build and 3-level event channel and the original value of
EVTCHNS_PER_BUCKET (128), the space needed to accommodate d->evtchn
would be 4 pages (PAGE_SIZE = 4096). Given that not every domain needs
3-level event channel, this leads to waste of memory. Also we've
restricted d->evtchn to one page, if we move to 3-level event channel,
Xen cannot build.

Having EVTCHN_PER_BUCKETS to be 512 can occupy exact one page.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/xen.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 6132682..4a354e1 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -557,7 +557,7 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
  *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
  */
 #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)
-#define EVTCHNS_PER_BUCKET 128
+#define EVTCHNS_PER_BUCKET 512
 #define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
 
 struct vcpu_time_info {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/16] Add evtchn_is_{pending, masked} and evtchn_clear_pending
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (6 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 07/16] Bump EVTCHNS_PER_BUCKET to 512 Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 09/16] Introduce some macros for event channels Wei Liu
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Some code paths access the arrays in shared info directly. This only
works with 2-level event channel.

Add functions to abstract away implementation details.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/irq.c         |    7 +++----
 xen/common/event_channel.c |   22 +++++++++++++++++++---
 xen/common/keyhandler.c    |    6 ++----
 xen/common/schedule.c      |    2 +-
 xen/include/xen/event.h    |    6 ++++++
 5 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 068c5a0..216271b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1452,7 +1452,7 @@ int pirq_guest_unmask(struct domain *d)
         {
             pirq = pirqs[i]->pirq;
             if ( pirqs[i]->masked &&
-                 !test_bit(pirqs[i]->evtchn, &shared_info(d, evtchn_mask)) )
+                 !evtchn_is_masked(d, pirqs[i]->evtchn) )
                 pirq_guest_eoi(pirqs[i]);
         }
     } while ( ++pirq < d->nr_pirqs && n == ARRAY_SIZE(pirqs) );
@@ -2093,13 +2093,12 @@ static void dump_irqs(unsigned char key)
                 info = pirq_info(d, pirq);
                 printk("%u:%3d(%c%c%c%c)",
                        d->domain_id, pirq,
-                       (test_bit(info->evtchn,
-                                 &shared_info(d, evtchn_pending)) ?
+                       (evtchn_is_pending(d, info->evtchn) ?
                         'P' : '-'),
                        (test_bit(info->evtchn / BITS_PER_EVTCHN_WORD(d),
                                  &vcpu_info(d->vcpu[0], evtchn_pending_sel)) ?
                         'S' : '-'),
-                       (test_bit(info->evtchn, &shared_info(d, evtchn_mask)) ?
+                       (evtchn_is_masked(d, info->evtchn) ?
                         'M' : '-'),
                        (info->masked ? 'M' : '-'));
                 if ( i != action->nr_guests )
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 43ee854..37fecee 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -95,6 +95,7 @@ static uint8_t get_xen_consumer(xen_event_channel_notification_t fn)
 #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1])
 
 static void evtchn_set_pending(struct vcpu *v, int port);
+static void evtchn_clear_pending(struct domain *d, int port);
 
 static int virq_is_global(uint32_t virq)
 {
@@ -156,6 +157,16 @@ static int get_free_port(struct domain *d)
     return port;
 }
 
+int evtchn_is_pending(struct domain *d, int port)
+{
+    return test_bit(port, &shared_info(d, evtchn_pending));
+}
+
+int evtchn_is_masked(struct domain *d, int port)
+{
+    return test_bit(port, &shared_info(d, evtchn_mask));
+}
+
 
 static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 {
@@ -529,7 +540,7 @@ static long __evtchn_close(struct domain *d1, int port1)
     }
 
     /* Clear pending event to avoid unexpected behavior on re-bind. */
-    clear_bit(port1, &shared_info(d1, evtchn_pending));
+    evtchn_clear_pending(d1, port1);
 
     /* Reset binding to vcpu0 when the channel is freed. */
     chn1->state          = ECS_FREE;
@@ -653,6 +664,11 @@ static void evtchn_set_pending(struct vcpu *v, int port)
     }
 }
 
+static void evtchn_clear_pending(struct domain *d, int port)
+{
+    clear_bit(port, &shared_info(d, evtchn_pending));
+}
+
 int guest_enabled_event(struct vcpu *v, uint32_t virq)
 {
     return ((v != NULL) && (v->virq_to_evtchn[virq] != 0));
@@ -1283,8 +1299,8 @@ static void domain_dump_evtchn_info(struct domain *d)
 
         printk("    %4u [%d/%d]: s=%d n=%d x=%d",
                port,
-               !!test_bit(port, &shared_info(d, evtchn_pending)),
-               !!test_bit(port, &shared_info(d, evtchn_mask)),
+               !!evtchn_is_pending(d, port),
+               !!evtchn_is_masked(d, port),
                chn->state, chn->notify_vcpu_id, chn->xen_consumer);
 
         switch ( chn->state )
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 2c5c230..16bc452 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -301,10 +301,8 @@ static void dump_domains(unsigned char key)
             printk("Notifying guest %d:%d (virq %d, port %d, stat %d/%d/%d)\n",
                    d->domain_id, v->vcpu_id,
                    VIRQ_DEBUG, v->virq_to_evtchn[VIRQ_DEBUG],
-                   test_bit(v->virq_to_evtchn[VIRQ_DEBUG], 
-                            &shared_info(d, evtchn_pending)),
-                   test_bit(v->virq_to_evtchn[VIRQ_DEBUG], 
-                            &shared_info(d, evtchn_mask)),
+                   evtchn_is_pending(d, v->virq_to_evtchn[VIRQ_DEBUG]),
+                   evtchn_is_masked(d, v->virq_to_evtchn[VIRQ_DEBUG]),
                    test_bit(v->virq_to_evtchn[VIRQ_DEBUG] /
                             BITS_PER_EVTCHN_WORD(d),
                             &vcpu_info(v, evtchn_pending_sel)));
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e6a90d8..1bf010e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -693,7 +693,7 @@ static long do_poll(struct sched_poll *sched_poll)
             goto out;
 
         rc = 0;
-        if ( test_bit(port, &shared_info(d, evtchn_pending)) )
+        if ( evtchn_is_pending(d, port) )
             goto out;
     }
 
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index c17b891..2d2c585 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -111,6 +111,12 @@ int evtchn_unmask(unsigned int port);
 /* Move all PIRQs after a vCPU was moved to another pCPU. */
 void evtchn_move_pirqs(struct vcpu *v);
 
+/* Tell a given event-channel port is pending or not */
+int evtchn_is_pending(struct domain *d, int port);
+
+/* Tell a given event-channel port is masked or not */
+int evtchn_is_masked(struct domain *d, int port);
+
 /* Allocate/free a Xen-attached event channel port. */
 typedef void (*xen_event_channel_notification_t)(
     struct vcpu *v, unsigned int port);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/16] Introduce some macros for event channels
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (7 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 08/16] Add evtchn_is_{pending, masked} and evtchn_clear_pending Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 10/16] Update Xen public header Wei Liu
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

For N-level event channels, the shared bitmaps in the hypervisor are by design
not guaranteed to be contigious.

These macros are used to calculate page number / offset within a page of a
given event channel.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/asm-arm/types.h  |    7 +++++--
 xen/include/asm-x86/config.h |    4 +++-
 xen/include/xen/event.h      |   13 +++++++++++++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/xen/include/asm-arm/types.h b/xen/include/asm-arm/types.h
index 48864f9..65562b8 100644
--- a/xen/include/asm-arm/types.h
+++ b/xen/include/asm-arm/types.h
@@ -41,10 +41,13 @@ typedef char bool_t;
 #define test_and_clear_bool(b) xchg(&(b), 0)
 
 #endif /* __ASSEMBLY__ */
+#define BYTE_BITORDER  3
+#define BITS_PER_BYTE  (1 << BYTE_BITORDER)
 
-#define BITS_PER_LONG 32
-#define BYTES_PER_LONG 4
+#define BITS_PER_LONG  (1 << LONG_BITORDER)
 #define LONG_BYTEORDER 2
+#define LONG_BITORDER  (LONG_BYTEORDER + BYTE_BITORDER)
+#define BYTES_PER_LONG (1 << LONG_BYTEORDER)
 
 #endif /* __ARM_TYPES_H__ */
 /*
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index da82e73..b921586 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -8,11 +8,13 @@
 #define __X86_CONFIG_H__
 
 #define LONG_BYTEORDER 3
+#define BYTE_BITORDER 3
+#define LONG_BITORDER (BYTE_BITORDER + LONG_BYTEORDER)
 #define CONFIG_PAGING_LEVELS 4
 
 #define BYTES_PER_LONG (1 << LONG_BYTEORDER)
 #define BITS_PER_LONG (BYTES_PER_LONG << 3)
-#define BITS_PER_BYTE 8
+#define BITS_PER_BYTE (1 << BYTE_BITORDER)
 
 #define CONFIG_X86 1
 #define CONFIG_X86_HT 1
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 2d2c585..cacd89d 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -36,6 +36,19 @@
 			};					\
 			__v;})
 
+/* N.B. EVTCHNS_PER_PAGE is always powers of 2, use shifts to optimize */
+#define EVTCHNS_SHIFT (PAGE_SHIFT+BYTE_BITORDER)
+#define EVTCHNS_PER_PAGE (_AC(1,L) << EVTCHNS_SHIFT)
+#define EVTCHN_MASK (~(EVTCHNS_PER_PAGE-1))
+#define EVTCHN_PAGE_NO(chn) ((chn) >> EVTCHNS_SHIFT)
+#define EVTCHN_OFFSET_IN_PAGE(chn) ((chn) & ~EVTCHN_MASK)
+
+#ifndef CONFIG_COMPAT
+#define EVTCHN_WORD_BITORDER(d) LONG_BITORDER
+#else
+#define EVTCHN_WORD_BITORDER(d) (has_32bit_shinfo(d) ? 5 : LONG_BITORDER)
+#endif
+
 struct evtchn
 {
 #define ECS_FREE         0 /* Channel is available for use.                  */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/16] Update Xen public header
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (8 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 09/16] Introduce some macros for event channels Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:42 ` [PATCH 11/16] Define N-level event channel registration interface Wei Liu
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/xen.h |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 4a354e1..2e2ec7f 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -554,11 +554,19 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
 
 /*
  * Event channel endpoints per domain:
+ * 2-level:
  *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
+ * 3-level:
+ *  32k if a long is 32 bits; 256k if a long is 64 bits.
  */
-#define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)
+#define NR_EVENT_CHANNELS_L2 (sizeof(unsigned long) * sizeof(unsigned long) * 64)
+#define NR_EVENT_CHANNELS_L3 (NR_EVENT_CHANNELS_L2 * 64)
+#if !defined(__XEN__) && !defined(__XEN_TOOLS__)
+#define NR_EVENT_CHANNELS NR_EVENT_CHANNELS_L2 /* for compatibility */
+#endif
+
 #define EVTCHNS_PER_BUCKET 512
-#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
+#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS_L2 / EVTCHNS_PER_BUCKET)
 
 struct vcpu_time_info {
     /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/16] Define N-level event channel registration interface
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (9 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 10/16] Update Xen public header Wei Liu
@ 2013-01-31 14:42 ` Wei Liu
  2013-01-31 14:43 ` [PATCH 12/16] Add control structures for 3-level event channel Wei Liu
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/event_channel.h |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/xen/include/public/event_channel.h b/xen/include/public/event_channel.h
index 07ff321..f26d6d5 100644
--- a/xen/include/public/event_channel.h
+++ b/xen/include/public/event_channel.h
@@ -71,6 +71,7 @@
 #define EVTCHNOP_bind_vcpu        8
 #define EVTCHNOP_unmask           9
 #define EVTCHNOP_reset           10
+#define EVTCHNOP_register_nlevel 11
 /* ` } */
 
 typedef uint32_t evtchn_port_t;
@@ -258,6 +259,38 @@ struct evtchn_reset {
 typedef struct evtchn_reset evtchn_reset_t;
 
 /*
+ * EVTCHNOP_register_nlevel: Register N-level event channel
+ * NOTES:
+ *  1. Currently only 3-level is supported.
+ *  2. Should fall back to 2-level if this call fails.
+ */
+/* 64 bit guests need 8 pages for evtchn_pending and evtchn_mask for
+ * 256k event channels while 32 bit ones only need 1 page for 32k
+ * event channels. */
+#define EVTCHN_MAX_L3_PAGES 8
+struct evtchn_register_3level {
+    /* IN parameters. */
+    uint32_t nr_pages;          /* for evtchn_{pending,mask} */
+    uint32_t nr_vcpus;          /* for l2sel_{mfns,offsets} */
+    XEN_GUEST_HANDLE(xen_pfn_t) evtchn_pending;
+    XEN_GUEST_HANDLE(xen_pfn_t) evtchn_mask;
+    XEN_GUEST_HANDLE(xen_pfn_t) l2sel_mfns;
+    XEN_GUEST_HANDLE(xen_pfn_t) l2sel_offsets;
+};
+typedef struct evtchn_register_3level evtchn_register_3level_t;
+DEFINE_XEN_GUEST_HANDLE(evtchn_register_3level_t);
+
+struct evtchn_register_nlevel {
+    /* IN parameters. */
+    uint32_t level;
+    union {
+        evtchn_register_3level_t l3;
+    } u;
+};
+typedef struct evtchn_register_nlevel evtchn_register_nlevel_t;
+DEFINE_XEN_GUEST_HANDLE(evtchn_register_nlevel_t);
+
+/*
  * ` enum neg_errnoval
  * ` HYPERVISOR_event_channel_op_compat(struct evtchn_op *op)
  * `
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/16] Add control structures for 3-level event channel
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (10 preceding siblings ...)
  2013-01-31 14:42 ` [PATCH 11/16] Define N-level event channel registration interface Wei Liu
@ 2013-01-31 14:43 ` Wei Liu
  2013-01-31 14:43 ` [PATCH 13/16] Make NR_EVTCHN_BUCKETS 3-level ready Wei Liu
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

The references to shared bitmap pending / mask are embedded in struct domain.
And pointer to the second level selector is embedded in struct vcpu.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2f18fe5..1d8c1b5 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -24,6 +24,7 @@
 #include <public/sysctl.h>
 #include <public/vcpu.h>
 #include <public/mem_event.h>
+#include <public/event_channel.h>
 
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
@@ -57,6 +58,9 @@ struct vcpu
 
     struct domain   *domain;
 
+    /* For 3-level event channels */
+    unsigned long   *evtchn_pending_sel_l2;
+
     struct vcpu     *next_in_list;
 
     s_time_t         periodic_period;
@@ -218,6 +222,8 @@ struct domain
     struct evtchn  **evtchn;
     spinlock_t       event_lock;
     unsigned int     evtchn_level;
+    unsigned long   *evtchn_pending[EVTCHN_MAX_L3_PAGES];
+    unsigned long   *evtchn_mask[EVTCHN_MAX_L3_PAGES];
 
     struct grant_table *grant_table;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/16] Make NR_EVTCHN_BUCKETS 3-level ready
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (11 preceding siblings ...)
  2013-01-31 14:43 ` [PATCH 12/16] Add control structures for 3-level event channel Wei Liu
@ 2013-01-31 14:43 ` Wei Liu
  2013-01-31 14:43 ` [PATCH 14/16] Genneralized event channel operations Wei Liu
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/public/xen.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 2e2ec7f..8fecd07 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -566,7 +566,7 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
 #endif
 
 #define EVTCHNS_PER_BUCKET 512
-#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS_L2 / EVTCHNS_PER_BUCKET)
+#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS_L3 / EVTCHNS_PER_BUCKET)
 
 struct vcpu_time_info {
     /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/16] Genneralized event channel operations
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (12 preceding siblings ...)
  2013-01-31 14:43 ` [PATCH 13/16] Make NR_EVTCHN_BUCKETS 3-level ready Wei Liu
@ 2013-01-31 14:43 ` Wei Liu
  2013-01-31 14:43 ` [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages Wei Liu
  2013-01-31 14:43 ` [PATCH 16/16] Implement 3-level event channel routines Wei Liu
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Use pointer in struct domain to reference evtchn_pending and evtchn_mask
bitmaps.

When building a domain, the default operation set is 2-level operation
set.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/arm/domain.c      |    1 +
 xen/arch/x86/domain.c      |    1 +
 xen/common/event_channel.c |   65 ++++++++++++++++++++++++++++++++++++--------
 xen/include/xen/event.h    |    3 ++
 4 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 59d8d73..bc477f6 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -417,6 +417,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
         goto fail;
 
     clear_page(d->shared_info);
+    evtchn_set_default_bitmap(d);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a58cc1a..a669dc0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -580,6 +580,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
             goto fail;
 
         clear_page(d->shared_info);
+        evtchn_set_default_bitmap(d);
         share_xen_page_with_guest(
             virt_to_page(d->shared_info), d, XENSHARE_writable);
 
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 37fecee..1ce97b0 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -51,6 +51,9 @@
 
 #define consumer_is_xen(e) (!!(e)->xen_consumer)
 
+static void evtchn_set_pending(struct vcpu *v, int port);
+static void evtchn_clear_pending(struct domain *d, int port);
+
 /*
  * The function alloc_unbound_xen_event_channel() allows an arbitrary
  * notifier function to be specified. However, very few unique functions
@@ -94,9 +97,6 @@ static uint8_t get_xen_consumer(xen_event_channel_notification_t fn)
 /* Get the notification function for a given Xen-bound event channel. */
 #define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1])
 
-static void evtchn_set_pending(struct vcpu *v, int port);
-static void evtchn_clear_pending(struct domain *d, int port);
-
 static int virq_is_global(uint32_t virq)
 {
     int rc;
@@ -159,15 +159,18 @@ static int get_free_port(struct domain *d)
 
 int evtchn_is_pending(struct domain *d, int port)
 {
-    return test_bit(port, &shared_info(d, evtchn_pending));
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    return test_bit(offset, d->evtchn_pending[page_no]);
 }
 
 int evtchn_is_masked(struct domain *d, int port)
 {
-    return test_bit(port, &shared_info(d, evtchn_mask));
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    return test_bit(offset, d->evtchn_mask[page_no]);
 }
 
-
 static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 {
     struct evtchn *chn;
@@ -623,7 +626,7 @@ out:
     return ret;
 }
 
-static void evtchn_set_pending(struct vcpu *v, int port)
+static void evtchn_set_pending_l2(struct vcpu *v, int port)
 {
     struct domain *d = v->domain;
     int vcpuid;
@@ -664,9 +667,25 @@ static void evtchn_set_pending(struct vcpu *v, int port)
     }
 }
 
+static void evtchn_set_pending(struct vcpu *v, int port)
+{
+    struct domain *d = v->domain;
+
+    switch ( d->evtchn_level )
+    {
+    case EVTCHN_2_LEVEL:
+        evtchn_set_pending_l2(v, port);
+        break;
+    default:
+        BUG();
+    }
+}
+
 static void evtchn_clear_pending(struct domain *d, int port)
 {
-    clear_bit(port, &shared_info(d, evtchn_pending));
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    clear_bit(offset, d->evtchn_pending[page_no]);
 }
 
 int guest_enabled_event(struct vcpu *v, uint32_t virq)
@@ -932,10 +951,12 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id)
 }
 
 
-int evtchn_unmask(unsigned int port)
+static int evtchn_unmask_l2(unsigned int port)
 {
     struct domain *d = current->domain;
     struct vcpu   *v;
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
 
     ASSERT(spin_is_locked(&d->event_lock));
 
@@ -948,8 +969,8 @@ int evtchn_unmask(unsigned int port)
      * These operations must happen in strict order. Based on
      * include/xen/event.h:evtchn_set_pending().
      */
-    if ( test_and_clear_bit(port, &shared_info(d, evtchn_mask)) &&
-         test_bit          (port, &shared_info(d, evtchn_pending)) &&
+    if ( test_and_clear_bit(offset, d->evtchn_mask[page_no]) &&
+         test_bit          (offset, d->evtchn_pending[page_no]) &&
          !test_and_set_bit (port / BITS_PER_EVTCHN_WORD(d),
                             &vcpu_info(v, evtchn_pending_sel)) )
     {
@@ -959,6 +980,23 @@ int evtchn_unmask(unsigned int port)
     return 0;
 }
 
+int evtchn_unmask(unsigned int port)
+{
+    struct domain *d = current->domain;
+    int rc = 0;
+
+    switch ( d->evtchn_level )
+    {
+    case EVTCHN_2_LEVEL:
+        rc = evtchn_unmask_l2(port);
+        break;
+    default:
+        BUG();
+    }
+
+    return rc;
+}
+
 
 static long evtchn_reset(evtchn_reset_t *r)
 {
@@ -1185,6 +1223,11 @@ void notify_via_xen_event_channel(struct domain *ld, int lport)
     spin_unlock(&ld->event_lock);
 }
 
+void evtchn_set_default_bitmap(struct domain *d)
+{
+    d->evtchn_pending[0] = (unsigned long *)shared_info(d, evtchn_pending);
+    d->evtchn_mask[0] = (unsigned long *)shared_info(d, evtchn_mask);
+}
 
 int evtchn_init(struct domain *d)
 {
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index cacd89d..34a82d0 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -145,6 +145,9 @@ int guest_enabled_event(struct vcpu *v, uint32_t virq);
 /* Notify remote end of a Xen-attached event channel.*/
 void notify_via_xen_event_channel(struct domain *ld, int lport);
 
+/* This is called after domain's shared info page is setup */
+void evtchn_set_default_bitmap(struct domain *d);
+
 /* Internal event channel object accessors */
 #define bucket_from_port(d,p) \
     ((d)->evtchn[(p)/EVTCHNS_PER_BUCKET])
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (13 preceding siblings ...)
  2013-01-31 14:43 ` [PATCH 14/16] Genneralized event channel operations Wei Liu
@ 2013-01-31 14:43 ` Wei Liu
  2013-02-04  9:23   ` Jan Beulich
  2013-01-31 14:43 ` [PATCH 16/16] Implement 3-level event channel routines Wei Liu
  15 siblings, 1 reply; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

NOTE: the registration call is always failed because other part of the code is
not yet completed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |  278 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 278 insertions(+)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 1ce97b0..c448c60 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -26,6 +26,7 @@
 #include <xen/compat.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
+#include <xen/paging.h>
 #include <asm/current.h>
 
 #include <public/xen.h>
@@ -1024,6 +1025,258 @@ out:
 }
 
 
+static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending,
+                            xen_pfn_t *mask, int nr_pages)
+{
+    int rc;
+    void *mapping;
+    struct page_info *pginfo;
+    unsigned long gfn;
+    int pending_count = 0, mask_count = 0;
+
+#define __MAP(src, dst, cnt)                                    \
+    for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ )                \
+    {                                                           \
+        rc = -EINVAL;                                           \
+        gfn = (src)[(cnt)];                                     \
+        pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);    \
+        if ( !pginfo )                                          \
+            goto err;                                           \
+        if ( !get_page_type(pginfo, PGT_writable_page) )        \
+        {                                                       \
+            put_page(pginfo);                                   \
+            goto err;                                           \
+        }                                                       \
+        mapping = __map_domain_page_global(pginfo);             \
+        if ( !mapping )                                         \
+        {                                                       \
+            put_page_and_type(pginfo);                          \
+            rc = -ENOMEM;                                       \
+            goto err;                                           \
+        }                                                       \
+        (dst)[(cnt)] = mapping;                                 \
+    }
+
+    __MAP(pending, d->evtchn_pending, pending_count)
+    __MAP(mask, d->evtchn_mask, mask_count)
+#undef __MAP
+
+    rc = 0;
+
+ err:
+    return rc;
+}
+
+static void __unmap_l3_arrays(struct domain *d)
+{
+    int i;
+    unsigned long mfn;
+
+    for ( i = 0; i < EVTCHN_MAX_L3_PAGES; i++ )
+    {
+        if ( d->evtchn_pending[i] != 0 )
+        {
+            mfn = domain_page_map_to_mfn(d->evtchn_pending[i]);
+            unmap_domain_page_global(d->evtchn_pending[i]);
+            put_page_and_type(mfn_to_page(mfn));
+            d->evtchn_pending[i] = 0;
+        }
+        if ( d->evtchn_mask[i] != 0 )
+        {
+            mfn = domain_page_map_to_mfn(d->evtchn_mask[i]);
+            unmap_domain_page_global(d->evtchn_mask[i]);
+            put_page_and_type(mfn_to_page(mfn));
+            d->evtchn_mask[i] = 0;
+        }
+    }
+}
+
+static long __map_l2_selector(struct vcpu *v, unsigned long gfn,
+                              unsigned long off)
+{
+    void *mapping;
+    int rc;
+    struct page_info *page;
+    struct domain *d = v->domain;
+
+    rc = -EINVAL;   /* common errno for following operations */
+
+    /* Sanity check: L2 selector has maximum size of sizeof(unsigned
+     * long) * 8, this size is equal to the size of shared bitmap
+     * array of 2-level event channel. */
+    if ( off + sizeof(unsigned long) * 8 >= PAGE_SIZE )
+        goto out;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        goto out;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        goto out;
+    }
+
+    /* Use global mapping here, because we need to map selector for
+     * other vcpu (v != current). However this mapping is only used by
+     * v when guest is running. */
+    mapping = __map_domain_page_global(page);
+
+    if ( mapping == NULL )
+    {
+        put_page_and_type(page);
+        rc = -ENOMEM;
+        goto out;
+    }
+
+    v->evtchn_pending_sel_l2 = mapping + off;
+    rc = 0;
+
+ out:
+    return rc;
+}
+
+static void __unmap_l2_selector(struct vcpu *v)
+{
+    unsigned long mfn;
+
+    if ( v->evtchn_pending_sel_l2 )
+    {
+        mfn = domain_page_map_to_mfn(v->evtchn_pending_sel_l2);
+        unmap_domain_page_global(v->evtchn_pending_sel_l2);
+        put_page_and_type(mfn_to_page(mfn));
+        v->evtchn_pending_sel_l2 = NULL;
+    }
+}
+
+static void __evtchn_unmap_all_3level(struct domain *d)
+{
+    struct vcpu *v;
+    for_each_vcpu ( d, v )
+        __unmap_l2_selector(v);
+    __unmap_l3_arrays(d);
+}
+
+static void __evtchn_setup_bitmap_l3(struct domain *d)
+{
+    struct vcpu *v;
+
+    /* Easy way to setup 3-level bitmap, just move existing selector
+     * to next level then copy pending array and mask array */
+    for_each_vcpu ( d, v )
+    {
+        memcpy(&v->evtchn_pending_sel_l2[0],
+               &vcpu_info(v, evtchn_pending_sel),
+               sizeof(vcpu_info(v, evtchn_pending_sel)));
+        memset(&vcpu_info(v, evtchn_pending_sel), 0,
+               sizeof(vcpu_info(v, evtchn_pending_sel)));
+        set_bit(0, &vcpu_info(v, evtchn_pending_sel));
+    }
+
+    memcpy(d->evtchn_pending[0], &shared_info(d, evtchn_pending),
+           sizeof(shared_info(d, evtchn_pending)));
+    memcpy(d->evtchn_mask[0], &shared_info(d, evtchn_mask),
+           sizeof(shared_info(d, evtchn_mask)));
+}
+
+static long evtchn_register_3level(evtchn_register_3level_t *arg)
+{
+    struct domain *d = current->domain;
+    struct vcpu *v;
+    int rc = 0;
+    xen_pfn_t evtchn_pending[EVTCHN_MAX_L3_PAGES];
+    xen_pfn_t evtchn_mask[EVTCHN_MAX_L3_PAGES];
+    xen_pfn_t l2sel_mfn = 0;
+    xen_pfn_t l2sel_offset = 0;
+
+    if ( d->evtchn_level == EVTCHN_3_LEVEL )
+    {
+        rc = -EINVAL;
+        goto out;
+    }
+
+    if ( arg->nr_vcpus > d->max_vcpus ||
+         arg->nr_pages > EVTCHN_MAX_L3_PAGES )
+    {
+        rc = -EINVAL;
+        goto out;
+    }
+
+    memset(evtchn_pending, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES);
+    memset(evtchn_mask, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES);
+
+#define __COPY_ARRAY(_d, _s, _nr)                                       \
+    do {                                                                \
+        if ( copy_from_guest((_d), (_s), (_nr)) )                       \
+        {                                                               \
+            rc = -EFAULT;                                               \
+            goto out;                                                   \
+        }                                                               \
+    } while (0)
+    __COPY_ARRAY(evtchn_pending, arg->evtchn_pending, arg->nr_pages);
+    __COPY_ARRAY(evtchn_mask, arg->evtchn_mask, arg->nr_pages);
+#undef __COPY_ARRAY
+
+    rc = __map_l3_arrays(d, evtchn_pending, evtchn_mask, arg->nr_pages);
+    if ( rc )
+        goto out;
+
+    for_each_vcpu ( d, v )
+    {
+        int vcpu_id = v->vcpu_id;
+
+        if ( unlikely(copy_from_guest_offset(&l2sel_mfn, arg->l2sel_mfns,
+                                             vcpu_id, 1)) )
+        {
+            rc = -EFAULT;
+            __evtchn_unmap_all_3level(d);
+            goto out;
+        }
+        if ( unlikely(copy_from_guest_offset(&l2sel_offset, arg->l2sel_offsets,
+                                             vcpu_id, 1)) )
+        {
+            rc = -EFAULT;
+            __evtchn_unmap_all_3level(d);
+            goto out;
+        }
+        if ( (rc = __map_l2_selector(v, l2sel_mfn, l2sel_offset)) )
+        {
+            __evtchn_unmap_all_3level(d);
+            goto out;
+        }
+    }
+
+    __evtchn_setup_bitmap_l3(d);
+
+    d->evtchn_level = EVTCHN_3_LEVEL;
+
+    rc = 0;
+
+ out:
+    return rc;
+}
+
+static long evtchn_register_nlevel(struct evtchn_register_nlevel *reg)
+{
+    struct domain *d = current->domain;
+    int rc;
+
+    spin_lock(&d->event_lock);
+
+    switch ( reg->level )
+    {
+    case EVTCHN_3_LEVEL:
+        rc = evtchn_register_3level(&reg->u.l3);
+        break;
+    default:
+        rc = -EINVAL;
+    }
+
+    spin_unlock(&d->event_lock);
+
+    return rc;
+}
+
 long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     long rc;
@@ -1132,6 +1385,18 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case EVTCHNOP_register_nlevel: {
+        struct evtchn_register_nlevel reg;
+        if ( copy_from_guest(&reg, arg, 1) != 0 )
+            return -EFAULT;
+        rc = evtchn_register_nlevel(&reg);
+
+        /* XXX always fails this call because it is not yet completed */
+        rc = -EINVAL;
+
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
@@ -1258,6 +1523,17 @@ int evtchn_init(struct domain *d)
     return 0;
 }
 
+static void evtchn_unmap_nlevel(struct domain *d)
+{
+    switch ( d->evtchn_level )
+    {
+    case EVTCHN_3_LEVEL:
+        __evtchn_unmap_all_3level(d);
+        break;
+    default:
+        break;
+    }
+}
 
 void evtchn_destroy(struct domain *d)
 {
@@ -1286,6 +1562,8 @@ void evtchn_destroy(struct domain *d)
 
     clear_global_virq_handlers(d);
 
+    evtchn_unmap_nlevel(d);
+
     free_xenheap_page(d->evtchn);
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 16/16] Implement 3-level event channel routines
  2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
                   ` (14 preceding siblings ...)
  2013-01-31 14:43 ` [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages Wei Liu
@ 2013-01-31 14:43 ` Wei Liu
  15 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-01-31 14:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, ian.campbell, jbeulich, david.vrabel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/event_channel.c |  110 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 90 insertions(+), 20 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index c448c60..a0bd00f 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -627,10 +627,33 @@ out:
     return ret;
 }
 
+static void __check_vcpu_polling(struct vcpu *v, int port)
+{
+    int vcpuid;
+    struct domain *d = v->domain;
+
+    /* Check if some VCPU might be polling for this event. */
+    if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
+        return;
+
+    /* Wake any interested (or potentially interested) pollers. */
+    for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
+          vcpuid < d->max_vcpus;
+          vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
+    {
+        v = d->vcpu[vcpuid];
+        if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) &&
+             test_and_clear_bit(vcpuid, d->poll_mask) )
+        {
+            v->poll_evtchn = 0;
+            vcpu_unblock(v);
+        }
+    }
+}
+
 static void evtchn_set_pending_l2(struct vcpu *v, int port)
 {
     struct domain *d = v->domain;
-    int vcpuid;
 
     /*
      * The following bit operations must happen in strict order.
@@ -649,23 +672,35 @@ static void evtchn_set_pending_l2(struct vcpu *v, int port)
         vcpu_mark_events_pending(v);
     }
 
-    /* Check if some VCPU might be polling for this event. */
-    if ( likely(bitmap_empty(d->poll_mask, d->max_vcpus)) )
-        return;
+    __check_vcpu_polling(v, port);
+}
 
-    /* Wake any interested (or potentially interested) pollers. */
-    for ( vcpuid = find_first_bit(d->poll_mask, d->max_vcpus);
-          vcpuid < d->max_vcpus;
-          vcpuid = find_next_bit(d->poll_mask, d->max_vcpus, vcpuid+1) )
+static void evtchn_set_pending_l3(struct vcpu *v, int port)
+{
+    struct domain *d = v->domain;
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    unsigned int l1bit = port >> (EVTCHN_WORD_BITORDER(d) << 1);
+    unsigned int l2bit = port >> EVTCHN_WORD_BITORDER(d);
+
+    /*
+     * The following bit operations must happen in strict order.
+     * NB. On x86, the atomic bit operations also act as memory barriers.
+     * There is therefore sufficiently strict ordering for this architecture --
+     * others may require explicit memory barriers.
+     */
+
+    if ( test_and_set_bit(offset, d->evtchn_pending[page_no]) )
+         return;
+
+    if ( !test_bit(offset, d->evtchn_mask[page_no]) &&
+         !test_and_set_bit(l2bit, v->evtchn_pending_sel_l2) &&
+         !test_and_set_bit(l1bit, &vcpu_info(v, evtchn_pending_sel)) )
     {
-        v = d->vcpu[vcpuid];
-        if ( ((v->poll_evtchn <= 0) || (v->poll_evtchn == port)) &&
-             test_and_clear_bit(vcpuid, d->poll_mask) )
-        {
-            v->poll_evtchn = 0;
-            vcpu_unblock(v);
-        }
+        vcpu_mark_events_pending(v);
     }
+
+    __check_vcpu_polling(v, port);
 }
 
 static void evtchn_set_pending(struct vcpu *v, int port)
@@ -677,6 +712,9 @@ static void evtchn_set_pending(struct vcpu *v, int port)
     case EVTCHN_2_LEVEL:
         evtchn_set_pending_l2(v, port);
         break;
+    case 3:
+        evtchn_set_pending_l3(v, port);
+        break;
     default:
         BUG();
     }
@@ -981,6 +1019,37 @@ static int evtchn_unmask_l2(unsigned int port)
     return 0;
 }
 
+static int evtchn_unmask_l3(unsigned int port)
+{
+    struct domain *d = current->domain;
+    struct vcpu   *v;
+    unsigned int page_no = EVTCHN_PAGE_NO(port);
+    unsigned int offset = EVTCHN_OFFSET_IN_PAGE(port);
+    unsigned int l1bit = port >> (EVTCHN_WORD_BITORDER(d) << 1);
+    unsigned int l2bit = port >> EVTCHN_WORD_BITORDER(d);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( unlikely(!port_is_valid(d, port)) )
+        return -EINVAL;
+
+    v = d->vcpu[evtchn_from_port(d, port)->notify_vcpu_id];
+
+    /*
+     * These operations must happen in strict order. Based on
+     * include/xen/event.h:evtchn_set_pending().
+     */
+    if ( test_and_clear_bit(offset, d->evtchn_mask[page_no]) &&
+         test_bit          (offset, d->evtchn_pending[page_no]) &&
+         !test_and_set_bit (l2bit, v->evtchn_pending_sel_l2) &&
+         !test_and_set_bit (l1bit, &vcpu_info(v, evtchn_pending_sel)) )
+    {
+        vcpu_mark_events_pending(v);
+    }
+
+    return 0;
+}
+
 int evtchn_unmask(unsigned int port)
 {
     struct domain *d = current->domain;
@@ -991,6 +1060,9 @@ int evtchn_unmask(unsigned int port)
     case EVTCHN_2_LEVEL:
         rc = evtchn_unmask_l2(port);
         break;
+    case 3:
+        rc = evtchn_unmask_l3(port);
+        break;
     default:
         BUG();
     }
@@ -1390,10 +1462,6 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&reg, arg, 1) != 0 )
             return -EFAULT;
         rc = evtchn_register_nlevel(&reg);
-
-        /* XXX always fails this call because it is not yet completed */
-        rc = -EINVAL;
-
         break;
     }
 
@@ -1602,8 +1670,10 @@ static void domain_dump_evtchn_info(struct domain *d)
     bitmap_scnlistprintf(keyhandler_scratch, sizeof(keyhandler_scratch),
                          d->poll_mask, d->max_vcpus);
     printk("Event channel information for domain %d:\n"
+           "Using %d-level event channel\n"
            "Polling vCPUs: {%s}\n"
-           "    port [p/m]\n", d->domain_id, keyhandler_scratch);
+           "    port [p/m]\n",
+           d->domain_id, d->evtchn_level, keyhandler_scratch);
 
     spin_lock(&d->event_lock);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/16] Move event channel macros / struct definition to proper place
  2013-01-31 14:42 ` [PATCH 04/16] Move event channel macros / struct definition to proper place Wei Liu
@ 2013-02-04  9:00   ` Jan Beulich
  2013-02-04 10:25     ` Wei Liu
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-02-04  9:00 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, david.vrabel, ian.campbell

>>> On 31.01.13 at 15:42, Wei Liu <wei.liu2@citrix.com> wrote:
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -557,6 +557,8 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
>   *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
>   */
>  #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)
> +#define EVTCHNS_PER_BUCKET 128
> +#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)

These aren't part of the hypercall ABI, and hence don't belong here.
What is preventing you from putting them alongside the other
stuff you move to xen/include/xen/event.h?

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-01-31 14:43 ` [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages Wei Liu
@ 2013-02-04  9:23   ` Jan Beulich
  2013-02-04 11:20     ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-02-04  9:23 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, david.vrabel, ian.campbell

>>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote:
> +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending,
> +                            xen_pfn_t *mask, int nr_pages)
> +{
> +    int rc;
> +    void *mapping;
> +    struct page_info *pginfo;
> +    unsigned long gfn;
> +    int pending_count = 0, mask_count = 0;
> +
> +#define __MAP(src, dst, cnt)                                    \
> +    for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ )                \
> +    {                                                           \
> +        rc = -EINVAL;                                           \
> +        gfn = (src)[(cnt)];                                     \
> +        pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);    \
> +        if ( !pginfo )                                          \
> +            goto err;                                           \
> +        if ( !get_page_type(pginfo, PGT_writable_page) )        \
> +        {                                                       \
> +            put_page(pginfo);                                   \
> +            goto err;                                           \
> +        }                                                       \
> +        mapping = __map_domain_page_global(pginfo);             \
> +        if ( !mapping )                                         \
> +        {                                                       \
> +            put_page_and_type(pginfo);                          \
> +            rc = -ENOMEM;                                       \
> +            goto err;                                           \
> +        }                                                       \
> +        (dst)[(cnt)] = mapping;                                 \
> +    }
> +
> +    __MAP(pending, d->evtchn_pending, pending_count)
> +    __MAP(mask, d->evtchn_mask, mask_count)
> +#undef __MAP
> +
> +    rc = 0;
> +
> + err:
> +    return rc;
> +}

So this alone already is up to 16 pages per guest, and hence a
theoretical maximum of 512k pages, i.e. 2G mapped space. The
global page mapping area, however, is only 1Gb in size on x86-64
(didn't check ARM at all)... Which is why I said that you need to
at least explain why bumping that address range isn't necessary
(i.e. if we think that we really don't want to support the
maximum number of guests allowed in theory, and that their
amount is really always going to be low enough to also not run
into resource conflicts with other users of the interface).

> +static long evtchn_register_3level(evtchn_register_3level_t *arg)
> +{
> +    struct domain *d = current->domain;
> +    struct vcpu *v;
> +    int rc = 0;
> +    xen_pfn_t evtchn_pending[EVTCHN_MAX_L3_PAGES];
> +    xen_pfn_t evtchn_mask[EVTCHN_MAX_L3_PAGES];
> +    xen_pfn_t l2sel_mfn = 0;
> +    xen_pfn_t l2sel_offset = 0;
> +
> +    if ( d->evtchn_level == EVTCHN_3_LEVEL )
> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    if ( arg->nr_vcpus > d->max_vcpus ||
> +         arg->nr_pages > EVTCHN_MAX_L3_PAGES )
> +    {
> +        rc = -EINVAL;
> +        goto out;
> +    }
> +
> +    memset(evtchn_pending, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES);
> +    memset(evtchn_mask, 0, sizeof(xen_pfn_t) * EVTCHN_MAX_L3_PAGES);
> +
> +#define __COPY_ARRAY(_d, _s, _nr)                                       \
> +    do {                                                                \
> +        if ( copy_from_guest((_d), (_s), (_nr)) )                       \
> +        {                                                               \
> +            rc = -EFAULT;                                               \
> +            goto out;                                                   \
> +        }                                                               \
> +    } while (0)
> +    __COPY_ARRAY(evtchn_pending, arg->evtchn_pending, arg->nr_pages);
> +    __COPY_ARRAY(evtchn_mask, arg->evtchn_mask, arg->nr_pages);
> +#undef __COPY_ARRAY

I don't think this really benefits from using the __COPY_ARRAY()
macro.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/16] Move event channel macros / struct definition to proper place
  2013-02-04  9:00   ` Jan Beulich
@ 2013-02-04 10:25     ` Wei Liu
  0 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-02-04 10:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xen.org, wei.liu2, David Vrabel, Ian Campbell

On Mon, 2013-02-04 at 09:00 +0000, Jan Beulich wrote:
> >>> On 31.01.13 at 15:42, Wei Liu <wei.liu2@citrix.com> wrote:
> > --- a/xen/include/public/xen.h
> > +++ b/xen/include/public/xen.h
> > @@ -557,6 +557,8 @@ DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
> >   *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
> >   */
> >  #define NR_EVENT_CHANNELS (sizeof(unsigned long) * sizeof(unsigned long) * 64)
> > +#define EVTCHNS_PER_BUCKET 128
> > +#define NR_EVTCHN_BUCKETS  (NR_EVENT_CHANNELS / EVTCHNS_PER_BUCKET)
> 
> These aren't part of the hypercall ABI, and hence don't belong here.
> What is preventing you from putting them alongside the other
> stuff you move to xen/include/xen/event.h?
> 

That would cause circular inclusion and break the build.

a) sched.h: struct domain reference NR_EVTCHN_BUCKETS
b) event.h: refereces sched.h

Now a second thought come to me, a clean fix would be that I make the
allocation of evtchn in struct domain first, then move those macros /
definitions to proper place.


Wei.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04  9:23   ` Jan Beulich
@ 2013-02-04 11:20     ` Ian Campbell
  2013-02-04 11:29       ` Jan Beulich
  2013-02-04 11:37       ` Wei Liu
  0 siblings, 2 replies; 32+ messages in thread
From: Ian Campbell @ 2013-02-04 11:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Wei Liu, David Vrabel, xen-devel@lists.xen.org

On Mon, 2013-02-04 at 09:23 +0000, Jan Beulich wrote:
> >>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote:
> > +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending,
> > +                            xen_pfn_t *mask, int nr_pages)
> > +{
> > +    int rc;
> > +    void *mapping;
> > +    struct page_info *pginfo;
> > +    unsigned long gfn;
> > +    int pending_count = 0, mask_count = 0;
> > +
> > +#define __MAP(src, dst, cnt)                                    \
> > +    for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ )                \
> > +    {                                                           \
> > +        rc = -EINVAL;                                           \
> > +        gfn = (src)[(cnt)];                                     \
> > +        pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);    \
> > +        if ( !pginfo )                                          \
> > +            goto err;                                           \
> > +        if ( !get_page_type(pginfo, PGT_writable_page) )        \
> > +        {                                                       \
> > +            put_page(pginfo);                                   \
> > +            goto err;                                           \
> > +        }                                                       \
> > +        mapping = __map_domain_page_global(pginfo);             \
> > +        if ( !mapping )                                         \
> > +        {                                                       \
> > +            put_page_and_type(pginfo);                          \
> > +            rc = -ENOMEM;                                       \
> > +            goto err;                                           \
> > +        }                                                       \
> > +        (dst)[(cnt)] = mapping;                                 \
> > +    }
> > +
> > +    __MAP(pending, d->evtchn_pending, pending_count)
> > +    __MAP(mask, d->evtchn_mask, mask_count)
> > +#undef __MAP
> > +
> > +    rc = 0;
> > +
> > + err:
> > +    return rc;
> > +}
> 
> So this alone already is up to 16 pages per guest, and hence a
> theoretical maximum of 512k pages, i.e. 2G mapped space.

That's given a theoretical 32k guests? Ouch. It also ignores the need
for other global mappings.

on the flip side only a minority of domains are likely to be using the
extended scheme, and I expect even those which are would not be using
all 16 pages, so maybe we can fault them in on demand as we bind/unbind
evtchns.

Where does 16 come from? How many pages to we end up with at each level
in the new scheme?

Some levels of the trie are per-VCPU, did you account for that already
in the 2GB?

>  The
> global page mapping area, however, is only 1Gb in size on x86-64
> (didn't check ARM at all)...

There isn't currently a global page mapping area on 32-bit ARM (I
suppose we have avoided them somehow...) but obviously 2G would be a
problem in a 4GB address space.

On ARM we currently have 2G for domheap mappings which I suppose we
would split if we needed a global page map

These need to be global so we can deliver evtchns to VCPUs which aren't
running, right? I suppose mapping on demand (other than for a running
VCPU) would be prohibitively expensive.

Could we make this space per-VCPU (or per-domain) by saying that a
domain maps its own evtchn pages plus the required pages from other
domains with which an evtchn is bound? Might be tricky to arrange
though, especially with the per-VCPU pages and affinity changes?

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 11:20     ` Ian Campbell
@ 2013-02-04 11:29       ` Jan Beulich
  2013-02-04 13:45         ` Wei Liu
  2013-02-04 11:37       ` Wei Liu
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-02-04 11:29 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, Wei Liu, David Vrabel, xen-devel@lists.xen.org

>>> On 04.02.13 at 12:20, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Mon, 2013-02-04 at 09:23 +0000, Jan Beulich wrote:
>> >>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote:
>> > +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending,
>> > +                            xen_pfn_t *mask, int nr_pages)
>> > +{
>> > +    int rc;
>> > +    void *mapping;
>> > +    struct page_info *pginfo;
>> > +    unsigned long gfn;
>> > +    int pending_count = 0, mask_count = 0;
>> > +
>> > +#define __MAP(src, dst, cnt)                                    \
>> > +    for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ )                \
>> > +    {                                                           \
>> > +        rc = -EINVAL;                                           \
>> > +        gfn = (src)[(cnt)];                                     \
>> > +        pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);    \
>> > +        if ( !pginfo )                                          \
>> > +            goto err;                                           \
>> > +        if ( !get_page_type(pginfo, PGT_writable_page) )        \
>> > +        {                                                       \
>> > +            put_page(pginfo);                                   \
>> > +            goto err;                                           \
>> > +        }                                                       \
>> > +        mapping = __map_domain_page_global(pginfo);             \
>> > +        if ( !mapping )                                         \
>> > +        {                                                       \
>> > +            put_page_and_type(pginfo);                          \
>> > +            rc = -ENOMEM;                                       \
>> > +            goto err;                                           \
>> > +        }                                                       \
>> > +        (dst)[(cnt)] = mapping;                                 \
>> > +    }
>> > +
>> > +    __MAP(pending, d->evtchn_pending, pending_count)
>> > +    __MAP(mask, d->evtchn_mask, mask_count)
>> > +#undef __MAP
>> > +
>> > +    rc = 0;
>> > +
>> > + err:
>> > +    return rc;
>> > +}
>> 
>> So this alone already is up to 16 pages per guest, and hence a
>> theoretical maximum of 512k pages, i.e. 2G mapped space.
> 
> That's given a theoretical 32k guests? Ouch. It also ignores the need
> for other global mappings.
> 
> on the flip side only a minority of domains are likely to be using the
> extended scheme, and I expect even those which are would not be using
> all 16 pages, so maybe we can fault them in on demand as we bind/unbind
> evtchns.
> 
> Where does 16 come from? How many pages to we end up with at each level
> in the new scheme?

Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we've
got two of them (pending and mask bits).

> Some levels of the trie are per-VCPU, did you account for that already
> in the 2GB?

No, I didn't, as it would only increase the number, and make
the math less clear.

>>  The
>> global page mapping area, however, is only 1Gb in size on x86-64
>> (didn't check ARM at all)...
> 
> There isn't currently a global page mapping area on 32-bit ARM (I
> suppose we have avoided them somehow...) but obviously 2G would be a
> problem in a 4GB address space.
> 
> On ARM we currently have 2G for domheap mappings which I suppose we
> would split if we needed a global page map
> 
> These need to be global so we can deliver evtchns to VCPUs which aren't
> running, right? I suppose mapping on demand (other than for a running
> VCPU) would be prohibitively expensive.

Likely, especially for high rate ones.

> Could we make this space per-VCPU (or per-domain) by saying that a
> domain maps its own evtchn pages plus the required pages from other
> domains with which an evtchn is bound? Might be tricky to arrange
> though, especially with the per-VCPU pages and affinity changes?

Even without that trickiness it wouldn't work I'm afraid: In various
cases we need to be able to raise the events out of context (timer,
IRQs from passed through devices).

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 11:20     ` Ian Campbell
  2013-02-04 11:29       ` Jan Beulich
@ 2013-02-04 11:37       ` Wei Liu
  1 sibling, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-02-04 11:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Tim (Xen.org), wei.liu2, David Vrabel, Jan Beulich,
	xen-devel@lists.xen.org

On Mon, 2013-02-04 at 11:20 +0000, Ian Campbell wrote:
> On Mon, 2013-02-04 at 09:23 +0000, Jan Beulich wrote:
> > >>> On 31.01.13 at 15:43, Wei Liu <wei.liu2@citrix.com> wrote:
> > > +static long __map_l3_arrays(struct domain *d, xen_pfn_t *pending,
> > > +                            xen_pfn_t *mask, int nr_pages)
> > > +{
> > > +    int rc;
> > > +    void *mapping;
> > > +    struct page_info *pginfo;
> > > +    unsigned long gfn;
> > > +    int pending_count = 0, mask_count = 0;
> > > +
> > > +#define __MAP(src, dst, cnt)                                    \
> > > +    for ( (cnt) = 0; (cnt) < nr_pages; (cnt)++ )                \
> > > +    {                                                           \
> > > +        rc = -EINVAL;                                           \
> > > +        gfn = (src)[(cnt)];                                     \
> > > +        pginfo = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);    \
> > > +        if ( !pginfo )                                          \
> > > +            goto err;                                           \
> > > +        if ( !get_page_type(pginfo, PGT_writable_page) )        \
> > > +        {                                                       \
> > > +            put_page(pginfo);                                   \
> > > +            goto err;                                           \
> > > +        }                                                       \
> > > +        mapping = __map_domain_page_global(pginfo);             \
> > > +        if ( !mapping )                                         \
> > > +        {                                                       \
> > > +            put_page_and_type(pginfo);                          \
> > > +            rc = -ENOMEM;                                       \
> > > +            goto err;                                           \
> > > +        }                                                       \
> > > +        (dst)[(cnt)] = mapping;                                 \
> > > +    }
> > > +
> > > +    __MAP(pending, d->evtchn_pending, pending_count)
> > > +    __MAP(mask, d->evtchn_mask, mask_count)
> > > +#undef __MAP
> > > +
> > > +    rc = 0;
> > > +
> > > + err:
> > > +    return rc;
> > > +}
> > 
> > So this alone already is up to 16 pages per guest, and hence a
> > theoretical maximum of 512k pages, i.e. 2G mapped space.
> 
> That's given a theoretical 32k guests? Ouch. It also ignores the need
> for other global mappings.
> 
> on the flip side only a minority of domains are likely to be using the
> extended scheme, and I expect even those which are would not be using
> all 16 pages, so maybe we can fault them in on demand as we bind/unbind
> evtchns.
> 

This is doable. However I'm afraid checking for mapping validity in hot
path could bring in performance penalty.

> Where does 16 come from? How many pages to we end up with at each level
> in the new scheme?
> 

For 64 bit guest, 8 pages each for evtchn_pending / evtchn_mask. And
there are also other global mappings for per-vcpu L2 selectors - there
is no API for a vcpu to manipulate other vcpu's mapping. So the worst
case would be there could be lots of global mappings if a domain has
hundreds of cpus utilizes 3-level event channel.

> Some levels of the trie are per-VCPU, did you account for that already
> in the 2GB?
> 
> >  The
> > global page mapping area, however, is only 1Gb in size on x86-64
> > (didn't check ARM at all)...
> 
> There isn't currently a global page mapping area on 32-bit ARM (I
> suppose we have avoided them somehow...) but obviously 2G would be a
> problem in a 4GB address space.
> 
> On ARM we currently have 2G for domheap mappings which I suppose we
> would split if we needed a global page map
> 
> These need to be global so we can deliver evtchns to VCPUs which aren't
> running, right? I suppose mapping on demand (other than for a running
> VCPU) would be prohibitively expensive.
> 

Those are the leaf mappings which are supposed to be global.

> Could we make this space per-VCPU (or per-domain) by saying that a
> domain maps its own evtchn pages plus the required pages from other
> domains with which an evtchn is bound? Might be tricky to arrange
> though, especially with the per-VCPU pages and affinity changes?
> 

Really tricky... Also potential performance penalty.



Wei.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 11:29       ` Jan Beulich
@ 2013-02-04 13:45         ` Wei Liu
  2013-02-04 13:47           ` Ian Campbell
  2013-02-04 14:06           ` Jan Beulich
  0 siblings, 2 replies; 32+ messages in thread
From: Wei Liu @ 2013-02-04 13:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim (Xen.org), xen-devel@lists.xen.org, wei.liu2, Ian Campbell,
	David Vrabel

On Mon, 2013-02-04 at 11:29 +0000, Jan Beulich wrote:
> >> 
> >> So this alone already is up to 16 pages per guest, and hence a
> >> theoretical maximum of 512k pages, i.e. 2G mapped space.
> > 
> > That's given a theoretical 32k guests? Ouch. It also ignores the need
> > for other global mappings.
> > 
> > on the flip side only a minority of domains are likely to be using the
> > extended scheme, and I expect even those which are would not be using
> > all 16 pages, so maybe we can fault them in on demand as we bind/unbind
> > evtchns.
> > 
> > Where does 16 come from? How many pages to we end up with at each level
> > in the new scheme?
> 
> Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we've
> got two of them (pending and mask bits).
> 
> > Some levels of the trie are per-VCPU, did you account for that already
> > in the 2GB?
> 
> No, I didn't, as it would only increase the number, and make
> the math less clear.
> 
> >>  The
> >> global page mapping area, however, is only 1Gb in size on x86-64
> >> (didn't check ARM at all)...
> > 
> > There isn't currently a global page mapping area on 32-bit ARM (I
> > suppose we have avoided them somehow...) but obviously 2G would be a
> > problem in a 4GB address space.
> > 
> > On ARM we currently have 2G for domheap mappings which I suppose we
> > would split if we needed a global page map
> > 
> > These need to be global so we can deliver evtchns to VCPUs which aren't
> > running, right? I suppose mapping on demand (other than for a running
> > VCPU) would be prohibitively expensive.
> 
> Likely, especially for high rate ones.
> 
> > Could we make this space per-VCPU (or per-domain) by saying that a
> > domain maps its own evtchn pages plus the required pages from other
> > domains with which an evtchn is bound? Might be tricky to arrange
> > though, especially with the per-VCPU pages and affinity changes?
> 
> Even without that trickiness it wouldn't work I'm afraid: In various
> cases we need to be able to raise the events out of context (timer,
> IRQs from passed through devices).
> 
> Jan

So I come up with following comment on the 3-level registration
interface (not specific to __map_l3_array() function).

/*
 * Note to 3-level event channel users:
 * Only enable 3-level event channel for Dom0 or driver domains, because
 * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
 * area in Xen.
 */



Wei.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 13:45         ` Wei Liu
@ 2013-02-04 13:47           ` Ian Campbell
  2013-02-04 13:51             ` Wei Liu
  2013-02-04 14:06           ` Jan Beulich
  1 sibling, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-02-04 13:47 UTC (permalink / raw)
  To: Wei Liu; +Cc: Tim (Xen.org), David Vrabel, Jan Beulich, xen-devel@lists.xen.org

On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote:

> /*
>  * Note to 3-level event channel users:
>  * Only enable 3-level event channel for Dom0 or driver domains, because
>  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
>  * area in Xen.
>  */

Can this be enforced by the system administrator?

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 13:47           ` Ian Campbell
@ 2013-02-04 13:51             ` Wei Liu
  2013-02-04 13:54               ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Wei Liu @ 2013-02-04 13:51 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Tim (Xen.org), wei.liu2, David Vrabel, Jan Beulich,
	xen-devel@lists.xen.org

On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote:
> On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote:
> 
> > /*
> >  * Note to 3-level event channel users:
> >  * Only enable 3-level event channel for Dom0 or driver domains, because
> >  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
> >  * area in Xen.
> >  */
> 
> Can this be enforced by the system administrator?
> 

Knowing a domain is Dom0 is easy, but is it possible to know a domain is
driver domain?


Wei.

> Ian.
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 13:51             ` Wei Liu
@ 2013-02-04 13:54               ` Ian Campbell
  2013-02-04 13:59                 ` Wei Liu
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-02-04 13:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: Tim (Xen.org), David Vrabel, Jan Beulich, xen-devel@lists.xen.org

On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote:
> On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote:
> > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote:
> > 
> > > /*
> > >  * Note to 3-level event channel users:
> > >  * Only enable 3-level event channel for Dom0 or driver domains, because
> > >  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
> > >  * area in Xen.
> > >  */
> > 
> > Can this be enforced by the system administrator?
> > 
> 
> Knowing a domain is Dom0 is easy, but is it possible to know a domain is
> driver domain?

The admin knows, at the very least they need to have a manual override
(or maybe this should even default off for non-dom0)

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 13:54               ` Ian Campbell
@ 2013-02-04 13:59                 ` Wei Liu
  2013-02-04 14:22                   ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Wei Liu @ 2013-02-04 13:59 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Tim (Xen.org), wei.liu2, David Vrabel, Jan Beulich,
	xen-devel@lists.xen.org

On Mon, 2013-02-04 at 13:54 +0000, Ian Campbell wrote:
> On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote:
> > On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote:
> > > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote:
> > > 
> > > > /*
> > > >  * Note to 3-level event channel users:
> > > >  * Only enable 3-level event channel for Dom0 or driver domains, because
> > > >  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
> > > >  * area in Xen.
> > > >  */
> > > 
> > > Can this be enforced by the system administrator?
> > > 
> > 
> > Knowing a domain is Dom0 is easy, but is it possible to know a domain is
> > driver domain?
> 
> The admin knows, at the very least they need to have a manual override
> (or maybe this should even default off for non-dom0)
> 

Do you mean maintaining white list in Xen or adding options in guest
kernel? I already have that in my kernel patch series - only enable
3-level event channel for Dom0. And I used to propose a kernel option
for overriding this, but Konrad didn't like it.


Wei.

> Ian.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 13:45         ` Wei Liu
  2013-02-04 13:47           ` Ian Campbell
@ 2013-02-04 14:06           ` Jan Beulich
  2013-02-04 14:36             ` Wei Liu
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-02-04 14:06 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel@lists.xen.org, Tim (Xen.org), DavidVrabel, Ian Campbell

>>> On 04.02.13 at 14:45, Wei Liu <wei.liu2@citrix.com> wrote:
> On Mon, 2013-02-04 at 11:29 +0000, Jan Beulich wrote:
>> >> 
>> >> So this alone already is up to 16 pages per guest, and hence a
>> >> theoretical maximum of 512k pages, i.e. 2G mapped space.
>> > 
>> > That's given a theoretical 32k guests? Ouch. It also ignores the need
>> > for other global mappings.
>> > 
>> > on the flip side only a minority of domains are likely to be using the
>> > extended scheme, and I expect even those which are would not be using
>> > all 16 pages, so maybe we can fault them in on demand as we bind/unbind
>> > evtchns.
>> > 
>> > Where does 16 come from? How many pages to we end up with at each level
>> > in the new scheme?
>> 
>> Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we've
>> got two of them (pending and mask bits).
>> 
>> > Some levels of the trie are per-VCPU, did you account for that already
>> > in the 2GB?
>> 
>> No, I didn't, as it would only increase the number, and make
>> the math less clear.
>> 
>> >>  The
>> >> global page mapping area, however, is only 1Gb in size on x86-64
>> >> (didn't check ARM at all)...
>> > 
>> > There isn't currently a global page mapping area on 32-bit ARM (I
>> > suppose we have avoided them somehow...) but obviously 2G would be a
>> > problem in a 4GB address space.
>> > 
>> > On ARM we currently have 2G for domheap mappings which I suppose we
>> > would split if we needed a global page map
>> > 
>> > These need to be global so we can deliver evtchns to VCPUs which aren't
>> > running, right? I suppose mapping on demand (other than for a running
>> > VCPU) would be prohibitively expensive.
>> 
>> Likely, especially for high rate ones.
>> 
>> > Could we make this space per-VCPU (or per-domain) by saying that a
>> > domain maps its own evtchn pages plus the required pages from other
>> > domains with which an evtchn is bound? Might be tricky to arrange
>> > though, especially with the per-VCPU pages and affinity changes?
>> 
>> Even without that trickiness it wouldn't work I'm afraid: In various
>> cases we need to be able to raise the events out of context (timer,
>> IRQs from passed through devices).
>> 
>> Jan
> 
> So I come up with following comment on the 3-level registration
> interface (not specific to __map_l3_array() function).
> 
> /*
>  * Note to 3-level event channel users:
>  * Only enable 3-level event channel for Dom0 or driver domains, because
>  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
>  * area in Xen.
>  */

So you intended to fail the request for other guests? That's fine
with me in principle, but how do you tell a driver domain from an
"ordinary" one?

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 13:59                 ` Wei Liu
@ 2013-02-04 14:22                   ` Ian Campbell
  2013-02-04 14:24                     ` Wei Liu
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-02-04 14:22 UTC (permalink / raw)
  To: Wei Liu; +Cc: Tim (Xen.org), David Vrabel, Jan Beulich, xen-devel@lists.xen.org

On Mon, 2013-02-04 at 13:59 +0000, Wei Liu wrote:
> On Mon, 2013-02-04 at 13:54 +0000, Ian Campbell wrote:
> > On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote:
> > > On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote:
> > > > 
> > > > > /*
> > > > >  * Note to 3-level event channel users:
> > > > >  * Only enable 3-level event channel for Dom0 or driver domains, because
> > > > >  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
> > > > >  * area in Xen.
> > > > >  */
> > > > 
> > > > Can this be enforced by the system administrator?
> > > > 
> > > 
> > > Knowing a domain is Dom0 is easy, but is it possible to know a domain is
> > > driver domain?
> > 
> > The admin knows, at the very least they need to have a manual override
> > (or maybe this should even default off for non-dom0)
> > 
> 
> Do you mean maintaining white list in Xen or adding options in guest
> kernel?

I mean that it should be a property of the domain (i.e. a flag in struct
domain or whatever) whether they can use 3-levels and this should be
settable by the host administrator when they build the guest.

> I already have that in my kernel patch series - only enable
> 3-level event channel for Dom0.

Imagine I am a malicious user of you cloud service, I could potentially
create dozens of guests using kernels which forcibly try to use 3-level
evtchns and suck up loads of host RAM.

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 14:22                   ` Ian Campbell
@ 2013-02-04 14:24                     ` Wei Liu
  0 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-02-04 14:24 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Tim (Xen.org), wei.liu2, David Vrabel, Jan Beulich,
	xen-devel@lists.xen.org

On Mon, 2013-02-04 at 14:22 +0000, Ian Campbell wrote:
> On Mon, 2013-02-04 at 13:59 +0000, Wei Liu wrote:
> > On Mon, 2013-02-04 at 13:54 +0000, Ian Campbell wrote:
> > > On Mon, 2013-02-04 at 13:51 +0000, Wei Liu wrote:
> > > > On Mon, 2013-02-04 at 13:47 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-02-04 at 13:45 +0000, Wei Liu wrote:
> > > > > 
> > > > > > /*
> > > > > >  * Note to 3-level event channel users:
> > > > > >  * Only enable 3-level event channel for Dom0 or driver domains, because
> > > > > >  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
> > > > > >  * area in Xen.
> > > > > >  */
> > > > > 
> > > > > Can this be enforced by the system administrator?
> > > > > 
> > > > 
> > > > Knowing a domain is Dom0 is easy, but is it possible to know a domain is
> > > > driver domain?
> > > 
> > > The admin knows, at the very least they need to have a manual override
> > > (or maybe this should even default off for non-dom0)
> > > 
> > 
> > Do you mean maintaining white list in Xen or adding options in guest
> > kernel?
> 
> I mean that it should be a property of the domain (i.e. a flag in struct
> domain or whatever) whether they can use 3-levels and this should be
> settable by the host administrator when they build the guest.
> 

I'm looking at this now since I realized that we cannot trust users at
all right after I sent my email...

> > I already have that in my kernel patch series - only enable
> > 3-level event channel for Dom0.
> 
> Imagine I am a malicious user of you cloud service, I could potentially
> create dozens of guests using kernels which forcibly try to use 3-level
> evtchns and suck up loads of host RAM.
> 

Right.

Wei.

> Ian.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages
  2013-02-04 14:06           ` Jan Beulich
@ 2013-02-04 14:36             ` Wei Liu
  0 siblings, 0 replies; 32+ messages in thread
From: Wei Liu @ 2013-02-04 14:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim (Xen.org), xen-devel@lists.xen.org, wei.liu2, David Vrabel,
	Ian Campbell

On Mon, 2013-02-04 at 14:06 +0000, Jan Beulich wrote:
> >>> On 04.02.13 at 14:45, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Mon, 2013-02-04 at 11:29 +0000, Jan Beulich wrote:
> >> >> 
> >> >> So this alone already is up to 16 pages per guest, and hence a
> >> >> theoretical maximum of 512k pages, i.e. 2G mapped space.
> >> > 
> >> > That's given a theoretical 32k guests? Ouch. It also ignores the need
> >> > for other global mappings.
> >> > 
> >> > on the flip side only a minority of domains are likely to be using the
> >> > extended scheme, and I expect even those which are would not be using
> >> > all 16 pages, so maybe we can fault them in on demand as we bind/unbind
> >> > evtchns.
> >> > 
> >> > Where does 16 come from? How many pages to we end up with at each level
> >> > in the new scheme?
> >> 
> >> Patch 11 defines EVTCHN_MAX_L3_PAGES to be 8, and we've
> >> got two of them (pending and mask bits).
> >> 
> >> > Some levels of the trie are per-VCPU, did you account for that already
> >> > in the 2GB?
> >> 
> >> No, I didn't, as it would only increase the number, and make
> >> the math less clear.
> >> 
> >> >>  The
> >> >> global page mapping area, however, is only 1Gb in size on x86-64
> >> >> (didn't check ARM at all)...
> >> > 
> >> > There isn't currently a global page mapping area on 32-bit ARM (I
> >> > suppose we have avoided them somehow...) but obviously 2G would be a
> >> > problem in a 4GB address space.
> >> > 
> >> > On ARM we currently have 2G for domheap mappings which I suppose we
> >> > would split if we needed a global page map
> >> > 
> >> > These need to be global so we can deliver evtchns to VCPUs which aren't
> >> > running, right? I suppose mapping on demand (other than for a running
> >> > VCPU) would be prohibitively expensive.
> >> 
> >> Likely, especially for high rate ones.
> >> 
> >> > Could we make this space per-VCPU (or per-domain) by saying that a
> >> > domain maps its own evtchn pages plus the required pages from other
> >> > domains with which an evtchn is bound? Might be tricky to arrange
> >> > though, especially with the per-VCPU pages and affinity changes?
> >> 
> >> Even without that trickiness it wouldn't work I'm afraid: In various
> >> cases we need to be able to raise the events out of context (timer,
> >> IRQs from passed through devices).
> >> 
> >> Jan
> > 
> > So I come up with following comment on the 3-level registration
> > interface (not specific to __map_l3_array() function).
> > 
> > /*
> >  * Note to 3-level event channel users:
> >  * Only enable 3-level event channel for Dom0 or driver domains, because
> >  * 3-level event channels consumes (16 + nr_vcpus pages) global mapping
> >  * area in Xen.
> >  */
> 
> So you intended to fail the request for other guests? That's fine
> with me in principle, but how do you tell a driver domain from an
> "ordinary" one?
> 

I can't at the moment. I'm investigating on adding a flag in domain
creation process.


Wei.

> Jan
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-02-04 14:36 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31 14:42 [PATCH 00/16] Implement 3-level event channel in Xen Wei Liu
2013-01-31 14:42 ` [PATCH 01/16] Remove trailing whitespaces in sched.h Wei Liu
2013-01-31 14:42 ` [PATCH 02/16] Remove trailing whitespaces in event.h Wei Liu
2013-01-31 14:42 ` [PATCH 03/16] Remove trailing whitespaces in xen.h Wei Liu
2013-01-31 14:42 ` [PATCH 04/16] Move event channel macros / struct definition to proper place Wei Liu
2013-02-04  9:00   ` Jan Beulich
2013-02-04 10:25     ` Wei Liu
2013-01-31 14:42 ` [PATCH 05/16] Add evtchn_level in struct domain Wei Liu
2013-01-31 14:42 ` [PATCH 06/16] Dynamically allocate d->evtchn Wei Liu
2013-01-31 14:42 ` [PATCH 07/16] Bump EVTCHNS_PER_BUCKET to 512 Wei Liu
2013-01-31 14:42 ` [PATCH 08/16] Add evtchn_is_{pending, masked} and evtchn_clear_pending Wei Liu
2013-01-31 14:42 ` [PATCH 09/16] Introduce some macros for event channels Wei Liu
2013-01-31 14:42 ` [PATCH 10/16] Update Xen public header Wei Liu
2013-01-31 14:42 ` [PATCH 11/16] Define N-level event channel registration interface Wei Liu
2013-01-31 14:43 ` [PATCH 12/16] Add control structures for 3-level event channel Wei Liu
2013-01-31 14:43 ` [PATCH 13/16] Make NR_EVTCHN_BUCKETS 3-level ready Wei Liu
2013-01-31 14:43 ` [PATCH 14/16] Genneralized event channel operations Wei Liu
2013-01-31 14:43 ` [PATCH 15/16] Infrastructure for manipulating 3-level event channel pages Wei Liu
2013-02-04  9:23   ` Jan Beulich
2013-02-04 11:20     ` Ian Campbell
2013-02-04 11:29       ` Jan Beulich
2013-02-04 13:45         ` Wei Liu
2013-02-04 13:47           ` Ian Campbell
2013-02-04 13:51             ` Wei Liu
2013-02-04 13:54               ` Ian Campbell
2013-02-04 13:59                 ` Wei Liu
2013-02-04 14:22                   ` Ian Campbell
2013-02-04 14:24                     ` Wei Liu
2013-02-04 14:06           ` Jan Beulich
2013-02-04 14:36             ` Wei Liu
2013-02-04 11:37       ` Wei Liu
2013-01-31 14:43 ` [PATCH 16/16] Implement 3-level event channel routines Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).