[PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes
@ 2013-10-31 15:03 David Vrabel
  2013-10-31 15:03 ` [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: David Vrabel @ 2013-10-31 15:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich

This series addresses two design flaws in the FIFO-based event channel
ABI.

1. Allow guests to expand the array after binding events.

2. Fix a potential DoS caused by an unbounded loop when setting LINK.

An updated design document is available from:

http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf

- Add section on memory barriers.
- EVTCHNOP expand array should be called after binding events.
- Guest clearing MASKED on a tail event is no longer valid, a
  hypercall is required. This allows the loop in the Xen's link()
  function to be bounded.

v8 of the Linux patches will be posted shortly.

Changes since v1:
- Add MAINTAINERS patch
- Remove some unnecessary temporary pending state clears
- Add fix for DoS

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer
  2013-10-31 15:03 [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel
@ 2013-10-31 15:03 ` David Vrabel
  2013-11-04 14:29   ` Jan Beulich
  2013-10-31 15:03 ` [PATCH 2/3] evtchn: don't lose pending state if FIFO event array page is missing David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: David Vrabel @ 2013-10-31 15:03 UTC (permalink / raw)
  To: xen-devel; +Cc: David Vrabel, Keir Fraser, David Vrabel, Jan Beulich

From: David Vrabel <dvrabel@cantab.net>

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 MAINTAINERS |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index adacac2..d7825c0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -156,6 +156,12 @@ F:     xen/include/efi/
 F:     xen/include/asm-x86/efi*.h
 F:     xen/include/asm-x86/x86_*/efi*.h
 
+EVENT CHANNELS (FIFO-BASED ABI)
+M:	David Vrabel <david.vrabel@citrix.com>
+S:	Supported
+F:	xen/common/event_fifo.c
+F:	xen/include/xen/event_fifo.h
+
 GDBSX DEBUGGER
 M:	Mukesh Rathor <mukesh.rathor@oracle.com>
 S:	Supported
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer
  2013-10-31 15:03 ` [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer David Vrabel
@ 2013-11-04 14:29   ` Jan Beulich
  2013-11-05 21:06     ` Keir Fraser
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2013-11-04 14:29 UTC (permalink / raw)
  To: David Vrabel; +Cc: David Vrabel, xen-devel, Keir Fraser

>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
> From: David Vrabel <dvrabel@cantab.net>
> 

Suggested-by: Jan Beulich <jbeulich@suse.com>

> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  MAINTAINERS |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index adacac2..d7825c0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -156,6 +156,12 @@ F:     xen/include/efi/
>  F:     xen/include/asm-x86/efi*.h
>  F:     xen/include/asm-x86/x86_*/efi*.h
>  
> +EVENT CHANNELS (FIFO-BASED ABI)
> +M:	David Vrabel <david.vrabel@citrix.com>
> +S:	Supported
> +F:	xen/common/event_fifo.c
> +F:	xen/include/xen/event_fifo.h
> +
>  GDBSX DEBUGGER
>  M:	Mukesh Rathor <mukesh.rathor@oracle.com>
>  S:	Supported
> -- 
> 1.7.2.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer
  2013-11-04 14:29   ` Jan Beulich
@ 2013-11-05 21:06     ` Keir Fraser
  2013-11-06 11:49       ` David Vrabel
  0 siblings, 1 reply; 25+ messages in thread
From: Keir Fraser @ 2013-11-05 21:06 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel; +Cc: David Vrabel, xen-devel

On 04/11/2013 14:29, "Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>> From: David Vrabel <dvrabel@cantab.net>
>> 
> 
> Suggested-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Keir Fraser <keir@xen.org>

>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> ---
>>  MAINTAINERS |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index adacac2..d7825c0 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -156,6 +156,12 @@ F:     xen/include/efi/
>>  F:     xen/include/asm-x86/efi*.h
>>  F:     xen/include/asm-x86/x86_*/efi*.h
>>  
>> +EVENT CHANNELS (FIFO-BASED ABI)
>> +M: David Vrabel <david.vrabel@citrix.com>
>> +S: Supported
>> +F: xen/common/event_fifo.c
>> +F: xen/include/xen/event_fifo.h
>> +
>>  GDBSX DEBUGGER
>>  M: Mukesh Rathor <mukesh.rathor@oracle.com>
>>  S: Supported
>> -- 
>> 1.7.2.5
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer
  2013-11-05 21:06     ` Keir Fraser
@ 2013-11-06 11:49       ` David Vrabel
  2013-11-06 12:40         ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: David Vrabel @ 2013-11-06 11:49 UTC (permalink / raw)
  To: Keir Fraser; +Cc: David Vrabel, xen-devel, Jan Beulich

On 05/11/13 21:06, Keir Fraser wrote:
> On 04/11/2013 14:29, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>>> From: David Vrabel <dvrabel@cantab.net>
>>>
>>
>> Suggested-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Keir Fraser <keir@xen.org>

Thanks.

Jan, can you either fix up the author to use my citrix.com email or wait
for me to repost v3 of this series.

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer
  2013-11-06 11:49       ` David Vrabel
@ 2013-11-06 12:40         ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2013-11-06 12:40 UTC (permalink / raw)
  To: David Vrabel; +Cc: David Vrabel, xen-devel, Keir Fraser

>>> On 06.11.13 at 12:49, David Vrabel <david.vrabel@citrix.com> wrote:
> On 05/11/13 21:06, Keir Fraser wrote:
>> On 04/11/2013 14:29, "Jan Beulich" <JBeulich@suse.com> wrote:
>> 
>>>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>>>> From: David Vrabel <dvrabel@cantab.net>
>>>>
>>>
>>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>> 
>> Acked-by: Keir Fraser <keir@xen.org>
> 
> Thanks.
> 
> Jan, can you either fix up the author to use my citrix.com email or wait
> for me to repost v3 of this series.

I was about to say "too late", but git has authorship as you want
it (I'm generally purging the From: lines when they appear to match
the first Signed-off-by, as the result would be the same with my
commit script, and it saves me from accidentally leaving the From:
in the commit message).

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/3] evtchn: don't lose pending state if FIFO event array page is missing
  2013-10-31 15:03 [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel
  2013-10-31 15:03 ` [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer David Vrabel
@ 2013-10-31 15:03 ` David Vrabel
  2013-11-04 14:29   ` Jan Beulich
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
  2013-10-31 15:13 ` [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel
  3 siblings, 1 reply; 25+ messages in thread
From: David Vrabel @ 2013-10-31 15:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich

From: David Vrabel <david.vrabel@citrix.com>

When the FIFO-based ABI is in use, if an event is bound when the
corresponding event array page is missing any attempt to set the event
pending will lose the event (because there is nowhere to write the
pending state).

This wasn't initially considered an issue because guests were expected
to only bind events once they had expanded the event array, however:

1. A domain may start with events already bound (by the toolstack).

2. The guest does not know what the port number will be until the
   event is bound (it doesn't know how many already bound events there
   are), so it does not know how many event array pages are required.
   This makes it difficult to expand in advanced (the current Linux
   implementation expands after binding for example).

To prevent pending events from being lost because there is no array
page, temporarily store the pending state in evtchn->pending.  When an
array page is added, use this state to set the port as pending.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/event_fifo.c |   47 +++++++++++++++++++++++++++++++++++++++++------
 xen/include/xen/sched.h |    1 +
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index bec8d87..64dbfff 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -61,8 +61,16 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
 
     port = evtchn->port;
     word = evtchn_fifo_word_from_port(d, port);
+
+    /*
+     * Event array page may not exist yet, save the pending state for
+     * when the page is added.
+     */
     if ( unlikely(!word) )
+    {
+        evtchn->pending = 1;
         return;
+    }
 
     /*
      * No locking around getting the queue. This may race with
@@ -322,16 +330,29 @@ static void cleanup_event_array(struct domain *d)
     xfree(d->evtchn_fifo);
 }
 
-static void set_priority_all(struct domain *d, unsigned int priority)
+static void setup_ports(struct domain *d)
 {
     unsigned int port;
 
+    /*
+     * For each port that is already bound:
+     *
+     * - save its pending state.
+     * - set default priority.
+     */
     for ( port = 1; port < d->max_evtchns; port++ )
     {
+        struct evtchn *evtchn;
+
         if ( !port_is_valid(d, port) )
             break;
 
-        evtchn_port_set_priority(d, evtchn_from_port(d, port), priority);
+        evtchn = evtchn_from_port(d, port);
+
+        if ( test_bit(port, &shared_info(d, evtchn_pending)) )
+            evtchn->pending = 1;
+
+        evtchn_fifo_set_priority(d, evtchn, EVTCHN_FIFO_PRIORITY_DEFAULT);
     }
 }
 
@@ -369,9 +390,6 @@ int evtchn_fifo_init_control(struct evtchn_init_control *init_control)
     /*
      * If this is the first control block, setup an empty event array
      * and switch to the fifo port ops.
-     *
-     * Any ports currently bound will have their priority set to the
-     * default.
      */
     if ( rc == 0 && !d->evtchn_fifo )
     {
@@ -382,7 +400,7 @@ int evtchn_fifo_init_control(struct evtchn_init_control *init_control)
         {
             d->evtchn_port_ops = &evtchn_port_ops_fifo;
             d->max_evtchns = EVTCHN_FIFO_NR_CHANNELS;
-            set_priority_all(d, EVTCHN_FIFO_PRIORITY_DEFAULT);
+            setup_ports(d);
         }
     }
 
@@ -395,6 +413,7 @@ static int add_page_to_event_array(struct domain *d, unsigned long gfn)
 {
     void *virt;
     unsigned int slot;
+    unsigned int port = d->evtchn_fifo->num_evtchns;
     int rc;
 
     slot = d->evtchn_fifo->num_evtchns / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
@@ -408,6 +427,22 @@ static int add_page_to_event_array(struct domain *d, unsigned long gfn)
     d->evtchn_fifo->event_array[slot] = virt;
     d->evtchn_fifo->num_evtchns += EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
 
+    /*
+     * Re-raise any events that were pending while this array page was
+     * missing.
+     */
+    for ( ; port < d->evtchn_fifo->num_evtchns; port++ )
+    {
+        struct evtchn *evtchn;
+
+        if ( !port_is_valid(d, port) )
+            break;
+
+        evtchn = evtchn_from_port(d, port);
+        if ( evtchn->pending )
+            evtchn_fifo_set_pending(d->vcpu[evtchn->notify_vcpu_id], evtchn);
+    }
+
     return 0;
 }
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 25bf637..624ea15 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -97,6 +97,7 @@ struct evtchn
         u16 virq;      /* state == ECS_VIRQ */
     } u;
     u8 priority;
+    u8 pending:1;
 #ifdef FLASK_ENABLE
     void *ssid;
 #endif
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] evtchn: don't lose pending state if FIFO event array page is missing
  2013-10-31 15:03 ` [PATCH 2/3] evtchn: don't lose pending state if FIFO event array page is missing David Vrabel
@ 2013-11-04 14:29   ` Jan Beulich
  2013-11-05 21:07     ` Keir Fraser
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2013-11-04 14:29 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> When the FIFO-based ABI is in use, if an event is bound when the
> corresponding event array page is missing any attempt to set the event
> pending will lose the event (because there is nowhere to write the
> pending state).
> 
> This wasn't initially considered an issue because guests were expected
> to only bind events once they had expanded the event array, however:
> 
> 1. A domain may start with events already bound (by the toolstack).
> 
> 2. The guest does not know what the port number will be until the
>    event is bound (it doesn't know how many already bound events there
>    are), so it does not know how many event array pages are required.
>    This makes it difficult to expand in advanced (the current Linux
>    implementation expands after binding for example).
> 
> To prevent pending events from being lost because there is no array
> page, temporarily store the pending state in evtchn->pending.  When an
> array page is added, use this state to set the port as pending.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> ---
>  xen/common/event_fifo.c |   47 +++++++++++++++++++++++++++++++++++++++++------
>  xen/include/xen/sched.h |    1 +
>  2 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> index bec8d87..64dbfff 100644
> --- a/xen/common/event_fifo.c
> +++ b/xen/common/event_fifo.c
> @@ -61,8 +61,16 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct 
> evtchn *evtchn)
>  
>      port = evtchn->port;
>      word = evtchn_fifo_word_from_port(d, port);
> +
> +    /*
> +     * Event array page may not exist yet, save the pending state for
> +     * when the page is added.
> +     */
>      if ( unlikely(!word) )
> +    {
> +        evtchn->pending = 1;
>          return;
> +    }
>  
>      /*
>       * No locking around getting the queue. This may race with
> @@ -322,16 +330,29 @@ static void cleanup_event_array(struct domain *d)
>      xfree(d->evtchn_fifo);
>  }
>  
> -static void set_priority_all(struct domain *d, unsigned int priority)
> +static void setup_ports(struct domain *d)
>  {
>      unsigned int port;
>  
> +    /*
> +     * For each port that is already bound:
> +     *
> +     * - save its pending state.
> +     * - set default priority.
> +     */
>      for ( port = 1; port < d->max_evtchns; port++ )
>      {
> +        struct evtchn *evtchn;
> +
>          if ( !port_is_valid(d, port) )
>              break;
>  
> -        evtchn_port_set_priority(d, evtchn_from_port(d, port), priority);
> +        evtchn = evtchn_from_port(d, port);
> +
> +        if ( test_bit(port, &shared_info(d, evtchn_pending)) )
> +            evtchn->pending = 1;
> +
> +        evtchn_fifo_set_priority(d, evtchn, EVTCHN_FIFO_PRIORITY_DEFAULT);
>      }
>  }
>  
> @@ -369,9 +390,6 @@ int evtchn_fifo_init_control(struct evtchn_init_control 
> *init_control)
>      /*
>       * If this is the first control block, setup an empty event array
>       * and switch to the fifo port ops.
> -     *
> -     * Any ports currently bound will have their priority set to the
> -     * default.
>       */
>      if ( rc == 0 && !d->evtchn_fifo )
>      {
> @@ -382,7 +400,7 @@ int evtchn_fifo_init_control(struct evtchn_init_control 
> *init_control)
>          {
>              d->evtchn_port_ops = &evtchn_port_ops_fifo;
>              d->max_evtchns = EVTCHN_FIFO_NR_CHANNELS;
> -            set_priority_all(d, EVTCHN_FIFO_PRIORITY_DEFAULT);
> +            setup_ports(d);
>          }
>      }
>  
> @@ -395,6 +413,7 @@ static int add_page_to_event_array(struct domain *d, 
> unsigned long gfn)
>  {
>      void *virt;
>      unsigned int slot;
> +    unsigned int port = d->evtchn_fifo->num_evtchns;
>      int rc;
>  
>      slot = d->evtchn_fifo->num_evtchns / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
> @@ -408,6 +427,22 @@ static int add_page_to_event_array(struct domain *d, 
> unsigned long gfn)
>      d->evtchn_fifo->event_array[slot] = virt;
>      d->evtchn_fifo->num_evtchns += EVTCHN_FIFO_EVENT_WORDS_PER_PAGE;
>  
> +    /*
> +     * Re-raise any events that were pending while this array page was
> +     * missing.
> +     */
> +    for ( ; port < d->evtchn_fifo->num_evtchns; port++ )
> +    {
> +        struct evtchn *evtchn;
> +
> +        if ( !port_is_valid(d, port) )
> +            break;
> +
> +        evtchn = evtchn_from_port(d, port);
> +        if ( evtchn->pending )
> +            evtchn_fifo_set_pending(d->vcpu[evtchn->notify_vcpu_id], evtchn);
> +    }
> +
>      return 0;
>  }
>  
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 25bf637..624ea15 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -97,6 +97,7 @@ struct evtchn
>          u16 virq;      /* state == ECS_VIRQ */
>      } u;
>      u8 priority;
> +    u8 pending:1;
>  #ifdef FLASK_ENABLE
>      void *ssid;
>  #endif
> -- 
> 1.7.2.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] evtchn: don't lose pending state if FIFO event array page is missing
  2013-11-04 14:29   ` Jan Beulich
@ 2013-11-05 21:07     ` Keir Fraser
  0 siblings, 0 replies; 25+ messages in thread
From: Keir Fraser @ 2013-11-05 21:07 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel; +Cc: xen-devel

On 04/11/2013 14:29, "Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>> 
>> When the FIFO-based ABI is in use, if an event is bound when the
>> corresponding event array page is missing any attempt to set the event
>> pending will lose the event (because there is nowhere to write the
>> pending state).
>> 
>> This wasn't initially considered an issue because guests were expected
>> to only bind events once they had expanded the event array, however:
>> 
>> 1. A domain may start with events already bound (by the toolstack).
>> 
>> 2. The guest does not know what the port number will be until the
>>    event is bound (it doesn't know how many already bound events there
>>    are), so it does not know how many event array pages are required.
>>    This makes it difficult to expand in advanced (the current Linux
>>    implementation expands after binding for example).
>> 
>> To prevent pending events from being lost because there is no array
>> page, temporarily store the pending state in evtchn->pending.  When an
>> array page is added, use this state to set the port as pending.
>> 
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Keir Fraser <keir@xen.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-10-31 15:03 [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel
  2013-10-31 15:03 ` [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer David Vrabel
  2013-10-31 15:03 ` [PATCH 2/3] evtchn: don't lose pending state if FIFO event array page is missing David Vrabel
@ 2013-10-31 15:03 ` David Vrabel
  2013-10-31 18:13   ` Boris Ostrovsky
                     ` (5 more replies)
  2013-10-31 15:13 ` [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel
  3 siblings, 6 replies; 25+ messages in thread
From: David Vrabel @ 2013-10-31 15:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich

From: David Vrabel <david.vrabel@citrix.com>

A malicious or buggy guest can cause another domain to spin
indefinitely by repeatedly writing to an event word when the other
domain is trying to link a new event.  The cmpxchg() in
evtchn_fifo_set_link() will repeatedly fail and the loop may never
terminate.

Fixing this requires a minor change to the ABI, which is documented in
draft G of the design.

http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf

Since a well-behaved guest only makes a limited set of state changes,
the loop can terminate early if the guest makes an invalid state
transition.

The guest may:

- clear LINKED and link
- clear PENDING
- set MASKED
- clear MASKED

It is valid for the guest to mask and unmask an event at any time so
we specify that it is not valid for a guest to clear MASKED if the
event is the tail of a queue (i.e., LINKED is set and LINK is clear).
Instead, the guest must make an EVCHNOP_unmask hypercall to unmask the
event.

The hypercall ensures that UNMASKED isn't cleared on a tail event
whose LINK field is being set by holding the appropriate queue lock.

The remaining valid writes (clear LINKED, clear PENDING, set MASKED)
will limit the number of failures of the cmpxchg() to at most 3.  A
clear of LINKED will also terminate the loop early (as before).
Therefore, the loop can then be limited to at most 3 iterations.

If the buggy or malicious guest does cause the loop to exit early, the
newly pending event will be unreachable by the guest and it and
subsequent events may be lost.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/event_fifo.c |   19 +++++++++++++++++--
 xen/include/xen/sched.h |    3 +++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
index 64dbfff..6cf30ec 100644
--- a/xen/common/event_fifo.c
+++ b/xen/common/event_fifo.c
@@ -37,6 +37,7 @@ static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d,
 static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
 {
     event_word_t n, o, w;
+    unsigned int try = 3;
 
     w = *word;
 
@@ -45,7 +46,8 @@ static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
             return 0;
         o = w;
         n = (w & ~EVTCHN_FIFO_LINK_MASK) | link;
-    } while ( (w = cmpxchg(word, o, n)) != o );
+        w = cmpxchg(word, o, n);
+    } while ( w != o && try-- );
 
     return 1;
 }
@@ -90,6 +92,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
         event_word_t *tail_word;
         bool_t linked = 0;
 
+        evtchn->q = q;
+
         spin_lock_irqsave(&q->lock, flags);
 
         /*
@@ -148,7 +152,18 @@ static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn)
     if ( unlikely(!word) )
         return;
 
-    clear_bit(EVTCHN_FIFO_MASKED, word);
+    /*
+     * If this event is on the tail of a queue, the queue lock must be
+     * held to avoid clearing MASKED while setting LINK.
+     */
+    if ( evtchn->q && evtchn->q->tail == evtchn->port )
+    {
+        spin_lock_irq(&evtchn->q->lock);
+        clear_bit(EVTCHN_FIFO_MASKED, word);
+        spin_unlock_irq(&evtchn->q->lock);
+    }
+    else
+        clear_bit(EVTCHN_FIFO_MASKED, word);
 
     /* Relink if pending. */
     if ( test_bit(EVTCHN_FIFO_PENDING, word) )
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 624ea15..7b0273a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -68,6 +68,8 @@ extern struct domain *dom0;
 #define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
 #define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)
 
+struct evtchn_fifo_queue;
+
 struct evtchn
 {
 #define ECS_FREE         0 /* Channel is available for use.                  */
@@ -98,6 +100,7 @@ struct evtchn
     } u;
     u8 priority;
     u8 pending:1;
+    struct evtchn_fifo_queue *q; /* Latest queue this event was on. */
 #ifdef FLASK_ENABLE
     void *ssid;
 #endif
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
@ 2013-10-31 18:13   ` Boris Ostrovsky
  2013-11-04 14:39   ` Jan Beulich
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2013-10-31 18:13 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Jan Beulich, xen-devel

On 10/31/2013 11:03 AM, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> A malicious or buggy guest can cause another domain to spin
> indefinitely by repeatedly writing to an event word when the other
> domain is trying to link a new event.  The cmpxchg() in
> evtchn_fifo_set_link() will repeatedly fail and the loop may never
> terminate.
>
> Fixing this requires a minor change to the ABI, which is documented in
> draft G of the design.
>
> http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf
>
> Since a well-behaved guest only makes a limited set of state changes,
> the loop can terminate early if the guest makes an invalid state
> transition.
>
> The guest may:
>
> - clear LINKED and link
> - clear PENDING
> - set MASKED
> - clear MASKED
>
> It is valid for the guest to mask and unmask an event at any time so
> we specify that it is not valid for a guest to clear MASKED if the
> event is the tail of a queue (i.e., LINKED is set and LINK is clear).
> Instead, the guest must make an EVCHNOP_unmask hypercall to unmask the
> event.
>
> The hypercall ensures that UNMASKED isn't cleared on a tail event
> whose LINK field is being set by holding the appropriate queue lock.
>
> The remaining valid writes (clear LINKED, clear PENDING, set MASKED)
> will limit the number of failures of the cmpxchg() to at most 3.  A
> clear of LINKED will also terminate the loop early (as before).
> Therefore, the loop can then be limited to at most 3 iterations.
>
> If the buggy or malicious guest does cause the loop to exit early, the
> newly pending event will be unreachable by the guest and it and
> subsequent events may be lost.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>   xen/common/event_fifo.c |   19 +++++++++++++++++--
>   xen/include/xen/sched.h |    3 +++
>   2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> index 64dbfff..6cf30ec 100644
> --- a/xen/common/event_fifo.c
> +++ b/xen/common/event_fifo.c
> @@ -37,6 +37,7 @@ static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d,
>   static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
>   {
>       event_word_t n, o, w;
> +    unsigned int try = 3;
>   
>       w = *word;
>   
> @@ -45,7 +46,8 @@ static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
>               return 0;
>           o = w;
>           n = (w & ~EVTCHN_FIFO_LINK_MASK) | link;
> -    } while ( (w = cmpxchg(word, o, n)) != o );
> +        w = cmpxchg(word, o, n);
> +    } while ( w != o && try-- );
>   
>       return 1;

Failure to set link here is an unexpected outcome, possibly indicating a 
malicious guest. Should you emit a (rate-limited) warning for this case?

-boris

>   }
> @@ -90,6 +92,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>           event_word_t *tail_word;
>           bool_t linked = 0;
>   
> +        evtchn->q = q;
> +
>           spin_lock_irqsave(&q->lock, flags);
>   
>           /*
> @@ -148,7 +152,18 @@ static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn)
>       if ( unlikely(!word) )
>           return;
>   
> -    clear_bit(EVTCHN_FIFO_MASKED, word);
> +    /*
> +     * If this event is on the tail of a queue, the queue lock must be
> +     * held to avoid clearing MASKED while setting LINK.
> +     */
> +    if ( evtchn->q && evtchn->q->tail == evtchn->port )
> +    {
> +        spin_lock_irq(&evtchn->q->lock);
> +        clear_bit(EVTCHN_FIFO_MASKED, word);
> +        spin_unlock_irq(&evtchn->q->lock);
> +    }
> +    else
> +        clear_bit(EVTCHN_FIFO_MASKED, word);
>   
>       /* Relink if pending. */
>       if ( test_bit(EVTCHN_FIFO_PENDING, word) )
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 624ea15..7b0273a 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -68,6 +68,8 @@ extern struct domain *dom0;
>   #define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
>   #define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)
>   
> +struct evtchn_fifo_queue;
> +
>   struct evtchn
>   {
>   #define ECS_FREE         0 /* Channel is available for use.                  */
> @@ -98,6 +100,7 @@ struct evtchn
>       } u;
>       u8 priority;
>       u8 pending:1;
> +    struct evtchn_fifo_queue *q; /* Latest queue this event was on. */
>   #ifdef FLASK_ENABLE
>       void *ssid;
>   #endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
  2013-10-31 18:13   ` Boris Ostrovsky
@ 2013-11-04 14:39   ` Jan Beulich
  2013-11-04 14:52     ` David Vrabel
  2013-11-05 14:19   ` Jan Beulich
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2013-11-04 14:39 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> A malicious or buggy guest can cause another domain to spin
> indefinitely by repeatedly writing to an event word when the other
> domain is trying to link a new event.  The cmpxchg() in
> evtchn_fifo_set_link() will repeatedly fail and the loop may never
> terminate.

So here you talk of two guests (with me not immediately seeing
where that interaction comes from - is it that for an interdomain
event the receiver could harm the sender?), ...

> Fixing this requires a minor change to the ABI, which is documented in
> draft G of the design.
> 
> http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf 
> 
> Since a well-behaved guest only makes a limited set of state changes,
> the loop can terminate early if the guest makes an invalid state
> transition.
> 
> The guest may:
> 
> - clear LINKED and link
> - clear PENDING
> - set MASKED
> - clear MASKED
> 
> It is valid for the guest to mask and unmask an event at any time so
> we specify that it is not valid for a guest to clear MASKED if the
> event is the tail of a queue (i.e., LINKED is set and LINK is clear).
> Instead, the guest must make an EVCHNOP_unmask hypercall to unmask the
> event.
> 
> The hypercall ensures that UNMASKED isn't cleared on a tail event
> whose LINK field is being set by holding the appropriate queue lock.
> 
> The remaining valid writes (clear LINKED, clear PENDING, set MASKED)
> will limit the number of failures of the cmpxchg() to at most 3.  A
> clear of LINKED will also terminate the loop early (as before).
> Therefore, the loop can then be limited to at most 3 iterations.
> 
> If the buggy or malicious guest does cause the loop to exit early, the
> newly pending event will be unreachable by the guest and it and
> subsequent events may be lost.

... yet here it is not really clear which guest the last "guest" refers
to (i.e. it's fine if the malicious guest harms itself, but the change
would be pointless if the malicious guest could still harm the other
one).

> @@ -90,6 +92,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>          event_word_t *tail_word;
>          bool_t linked = 0;
>  
> +        evtchn->q = q;
> +
>          spin_lock_irqsave(&q->lock, flags);
>  
>          /*

I fail to see how this change is related to the rest of the patch.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-04 14:39   ` Jan Beulich
@ 2013-11-04 14:52     ` David Vrabel
  2013-11-04 14:57       ` Jan Beulich
  2013-11-04 15:07       ` Ian Campbell
  0 siblings, 2 replies; 25+ messages in thread
From: David Vrabel @ 2013-11-04 14:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

On 04/11/13 14:39, Jan Beulich wrote:
>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> A malicious or buggy guest can cause another domain to spin
>> indefinitely by repeatedly writing to an event word when the other
>> domain is trying to link a new event.  The cmpxchg() in
>> evtchn_fifo_set_link() will repeatedly fail and the loop may never
>> terminate.
> 
> So here you talk of two guests (with me not immediately seeing
> where that interaction comes from - is it that for an interdomain
> event the receiver could harm the sender?), ...

Yes.  Guest A notifies guest M which requires linking a new event into
one of guest B's event queue.  While guest A is writing the guest M's
event array (to set the LINK field), guest M may repeatedly write to the
same event word, causing the cmpxchg() to repeatedly fail.

Guest A could also be the hypervisor for VIRQs and other events raised
by Xen.

>> Fixing this requires a minor change to the ABI, which is documented in
>> draft G of the design.
>>
>> http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf 
>>
>> Since a well-behaved guest only makes a limited set of state changes,
>> the loop can terminate early if the guest makes an invalid state
>> transition.
>>
>> The guest may:
>>
>> - clear LINKED and link
>> - clear PENDING
>> - set MASKED
>> - clear MASKED
>>
>> It is valid for the guest to mask and unmask an event at any time so
>> we specify that it is not valid for a guest to clear MASKED if the
>> event is the tail of a queue (i.e., LINKED is set and LINK is clear).
>> Instead, the guest must make an EVCHNOP_unmask hypercall to unmask the
>> event.
>>
>> The hypercall ensures that UNMASKED isn't cleared on a tail event
>> whose LINK field is being set by holding the appropriate queue lock.
>>
>> The remaining valid writes (clear LINKED, clear PENDING, set MASKED)
>> will limit the number of failures of the cmpxchg() to at most 3.  A
>> clear of LINKED will also terminate the loop early (as before).
>> Therefore, the loop can then be limited to at most 3 iterations.
>>
>> If the buggy or malicious guest does cause the loop to exit early, the
>> newly pending event will be unreachable by the guest and it and
>> subsequent events may be lost.
> 
> ... yet here it is not really clear which guest the last "guest" refers
> to (i.e. it's fine if the malicious guest harms itself, but the change
> would be pointless if the malicious guest could still harm the other
> one).

The malicious guest loses the event.

>> @@ -90,6 +92,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>>          event_word_t *tail_word;
>>          bool_t linked = 0;
>>  
>> +        evtchn->q = q;
>> +
>>          spin_lock_irqsave(&q->lock, flags);
>>  
>>          /*
> 
> I fail to see how this change is related to the rest of the patch.

This is needed so the correct queue lock is used in evtchn_fifo_unmask().

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-04 14:52     ` David Vrabel
@ 2013-11-04 14:57       ` Jan Beulich
  2013-11-04 16:30         ` David Vrabel
  2013-11-04 15:07       ` Ian Campbell
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2013-11-04 14:57 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 04.11.13 at 15:52, David Vrabel <david.vrabel@citrix.com> wrote:
> On 04/11/13 14:39, Jan Beulich wrote:
>>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>>> If the buggy or malicious guest does cause the loop to exit early, the
>>> newly pending event will be unreachable by the guest and it and
>>> subsequent events may be lost.
>> 
>> ... yet here it is not really clear which guest the last "guest" refers
>> to (i.e. it's fine if the malicious guest harms itself, but the change
>> would be pointless if the malicious guest could still harm the other
>> one).
> 
> The malicious guest loses the event.

So then "that guest" would perhaps be more precise than "the
guest"?

>>> @@ -90,6 +92,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>>>          event_word_t *tail_word;
>>>          bool_t linked = 0;
>>>  
>>> +        evtchn->q = q;
>>> +
>>>          spin_lock_irqsave(&q->lock, flags);
>>>  
>>>          /*
>> 
>> I fail to see how this change is related to the rest of the patch.
> 
> This is needed so the correct queue lock is used in evtchn_fifo_unmask().

I guessed that, but I wasn't able to immediately see why that would
not have been a requirement before.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-04 14:57       ` Jan Beulich
@ 2013-11-04 16:30         ` David Vrabel
  2013-11-05 14:18           ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: David Vrabel @ 2013-11-04 16:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

On 04/11/13 14:57, Jan Beulich wrote:
>>>> On 04.11.13 at 15:52, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 04/11/13 14:39, Jan Beulich wrote:
>>>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>>>> If the buggy or malicious guest does cause the loop to exit early, the
>>>> newly pending event will be unreachable by the guest and it and
>>>> subsequent events may be lost.
>>>
>>> ... yet here it is not really clear which guest the last "guest" refers
>>> to (i.e. it's fine if the malicious guest harms itself, but the change
>>> would be pointless if the malicious guest could still harm the other
>>> one).
>>
>> The malicious guest loses the event.
> 
> So then "that guest" would perhaps be more precise than "the
> guest"?

Yes.  I'll improve the commit message to be clearer.

>>>> @@ -90,6 +92,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>>>>          event_word_t *tail_word;
>>>>          bool_t linked = 0;
>>>>  
>>>> +        evtchn->q = q;
>>>> +
>>>>          spin_lock_irqsave(&q->lock, flags);
>>>>  
>>>>          /*
>>>
>>> I fail to see how this change is related to the rest of the patch.
>>
>> This is needed so the correct queue lock is used in evtchn_fifo_unmask().
> 
> I guessed that, but I wasn't able to immediately see why that would
> not have been a requirement before.

An event now has two queues associated with it.

1. The queue it is supposed to be on.  This is found with
evtchn->notify_vcpu_id and evtchn->priority when an event is being linked.

2. The queue the event is on right now (the new evtchn->q), until this
patch we didn't care which queue an event was on, only that it was on
some queue.

These are not necessarily the same since an event may be moved between
VCPUs or have its priority changed while it is already on a queue.

Some additional notes on the locking:

The locking in the FIFO-based ABI is designed on the assumption that
every hypervisor operation on an event channel (set pending, unmask) is
done while holding a per-event channel spin lock.

Currently, the per-event locking is done with the coarser per-domain
event_lock.

The queue lock serializes adding events and clearing MASKED on tail events.

The event lock also protects evtchn->q and the evtchn->q->tail ==
evtchn->port test.

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-04 16:30         ` David Vrabel
@ 2013-11-05 14:18           ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2013-11-05 14:18 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 04.11.13 at 17:30, David Vrabel <david.vrabel@citrix.com> wrote:
> On 04/11/13 14:57, Jan Beulich wrote:
>>>>> On 04.11.13 at 15:52, David Vrabel <david.vrabel@citrix.com> wrote:
>>> On 04/11/13 14:39, Jan Beulich wrote:
>>>>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>>>>>          event_word_t *tail_word;
>>>>>          bool_t linked = 0;
>>>>>  
>>>>> +        evtchn->q = q;
>>>>> +
>>>>>          spin_lock_irqsave(&q->lock, flags);
>>>>>  
>>>>>          /*
>>>>
>>>> I fail to see how this change is related to the rest of the patch.
>>>
>>> This is needed so the correct queue lock is used in evtchn_fifo_unmask().
>> 
>> I guessed that, but I wasn't able to immediately see why that would
>> not have been a requirement before.
> 
> An event now has two queues associated with it.
> 
> 1. The queue it is supposed to be on.  This is found with
> evtchn->notify_vcpu_id and evtchn->priority when an event is being linked.
> 
> 2. The queue the event is on right now (the new evtchn->q), until this
> patch we didn't care which queue an event was on, only that it was on
> some queue.
> 
> These are not necessarily the same since an event may be moved between
> VCPUs or have its priority changed while it is already on a queue.
> 
> Some additional notes on the locking:
> 
> The locking in the FIFO-based ABI is designed on the assumption that
> every hypervisor operation on an event channel (set pending, unmask) is
> done while holding a per-event channel spin lock.
> 
> Currently, the per-event locking is done with the coarser per-domain
> event_lock.
> 
> The queue lock serializes adding events and clearing MASKED on tail events.
> 
> The event lock also protects evtchn->q and the evtchn->q->tail ==
> evtchn->port test.

But aren't you setting evtchn->q too late then?
evtchn_fifo_unmask() racing evtchn_fifo_set_pending() could
then lead to the latter seeing ->q still clear, the former setting it
and doing its linking work while in parallel the latter might clear
MASKED at any time.

Plus - shouldn't you clear ->q at some point again?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-04 14:52     ` David Vrabel
  2013-11-04 14:57       ` Jan Beulich
@ 2013-11-04 15:07       ` Ian Campbell
  2013-11-04 15:11         ` David Vrabel
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2013-11-04 15:07 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Jan Beulich

On Mon, 2013-11-04 at 14:52 +0000, David Vrabel wrote:
> On 04/11/13 14:39, Jan Beulich wrote:
> >>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> A malicious or buggy guest can cause another domain to spin
> >> indefinitely by repeatedly writing to an event word when the other
> >> domain is trying to link a new event.  The cmpxchg() in
> >> evtchn_fifo_set_link() will repeatedly fail and the loop may never
> >> terminate.
> > 
> > So here you talk of two guests (with me not immediately seeing
> > where that interaction comes from - is it that for an interdomain
> > event the receiver could harm the sender?), ...
> 
> Yes.  Guest A notifies guest M which requires linking a new event into
> one of guest B's event queue.  While guest A is writing the guest M's
> event array (to set the LINK field), guest M may repeatedly write to the
> same event word, causing the cmpxchg() to repeatedly fail.

M == B here?

Ian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-04 15:07       ` Ian Campbell
@ 2013-11-04 15:11         ` David Vrabel
  0 siblings, 0 replies; 25+ messages in thread
From: David Vrabel @ 2013-11-04 15:11 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Keir Fraser, Jan Beulich

On 04/11/13 15:07, Ian Campbell wrote:
> On Mon, 2013-11-04 at 14:52 +0000, David Vrabel wrote:
>> On 04/11/13 14:39, Jan Beulich wrote:
>>>>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>>>
>>>> A malicious or buggy guest can cause another domain to spin
>>>> indefinitely by repeatedly writing to an event word when the other
>>>> domain is trying to link a new event.  The cmpxchg() in
>>>> evtchn_fifo_set_link() will repeatedly fail and the loop may never
>>>> terminate.
>>>
>>> So here you talk of two guests (with me not immediately seeing
>>> where that interaction comes from - is it that for an interdomain
>>> event the receiver could harm the sender?), ...
>>
>> Yes.  Guest A notifies guest M which requires linking a new event into
>> one of guest B's event queue.  While guest A is writing the guest M's
>> event array (to set the LINK field), guest M may repeatedly write to the
>> same event word, causing the cmpxchg() to repeatedly fail.
> 
> M == B here?

Yes. I originally had B then changed it to M for Malicious to be clearer...

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
  2013-10-31 18:13   ` Boris Ostrovsky
  2013-11-04 14:39   ` Jan Beulich
@ 2013-11-05 14:19   ` Jan Beulich
  2013-11-05 14:25   ` Jan Beulich
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2013-11-05 14:19 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -68,6 +68,8 @@ extern struct domain *dom0;
>  #define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
>  #define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)
>  
> +struct evtchn_fifo_queue;
> +

This is superfluous in C, since ...

>  struct evtchn
>  {
>  #define ECS_FREE         0 /* Channel is available for use.                  */
> @@ -98,6 +100,7 @@ struct evtchn
>      } u;
>      u8 priority;
>      u8 pending:1;
> +    struct evtchn_fifo_queue *q; /* Latest queue this event was on. */

... this already puts it at global scope (other than in C++).

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
                     ` (2 preceding siblings ...)
  2013-11-05 14:19   ` Jan Beulich
@ 2013-11-05 14:25   ` Jan Beulich
  2013-11-06 13:38   ` David Vrabel
  2013-11-10 21:21   ` Matt Wilson
  5 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2013-11-05 14:25 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 31.10.13 at 16:03, David Vrabel <david.vrabel@citrix.com> wrote:
> @@ -98,6 +100,7 @@ struct evtchn
>      } u;
>      u8 priority;
>      u8 pending:1;
> +    struct evtchn_fifo_queue *q; /* Latest queue this event was on. */

Oh, and - just like for the original q you had here in an earlier
version of the FIFO patches - I think you could get away with
again just storing the "current" priority (8 bits) instead of the
queue pointer (8 bytes).

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
                     ` (3 preceding siblings ...)
  2013-11-05 14:25   ` Jan Beulich
@ 2013-11-06 13:38   ` David Vrabel
  2013-11-06 15:01     ` Boris Ostrovsky
  2013-11-10 21:21   ` Matt Wilson
  5 siblings, 1 reply; 25+ messages in thread
From: David Vrabel @ 2013-11-06 13:38 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Jan Beulich, xen-devel

On 31/10/13 15:03, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> A malicious or buggy guest can cause another domain to spin
> indefinitely by repeatedly writing to an event word when the other
> domain is trying to link a new event.  The cmpxchg() in
> evtchn_fifo_set_link() will repeatedly fail and the loop may never
> terminate.
> 
> Fixing this requires a minor change to the ABI, which is documented in
> draft G of the design.
> 
> http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf
> 
> Since a well-behaved guest only makes a limited set of state changes,
> the loop can terminate early if the guest makes an invalid state
> transition.
> 
> The guest may:
> 
> - clear LINKED and link
> - clear PENDING
> - set MASKED
> - clear MASKED
> 
> It is valid for the guest to mask and unmask an event at any time so
> we specify that it is not valid for a guest to clear MASKED if the
> event is the tail of a queue (i.e., LINKED is set and LINK is clear).
> Instead, the guest must make an EVCHNOP_unmask hypercall to unmask the
> event.

Given the non-obvious locking required for this to be safe and the
overhead of the guest having to do a unmask hypercall more often.  I
think I will fix this differently.

A new BUSY bit is added to the event word.  Xen sets BUSY prior to
updating the LINK field and then clears it when the LINK field is set.

When the guest unmasks an event it must spin waiting for BUSY to clear
before it clears the MASKED bit.  It then need only do the unmask
hypercall if the event is pending (as before).

Draft H and v3 of this series to follow.

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-06 13:38   ` David Vrabel
@ 2013-11-06 15:01     ` Boris Ostrovsky
  2013-11-06 15:07       ` David Vrabel
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2013-11-06 15:01 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Jan Beulich, xen-devel

On 11/06/2013 08:38 AM, David Vrabel wrote:
> On 31/10/13 15:03, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> A malicious or buggy guest can cause another domain to spin
>> indefinitely by repeatedly writing to an event word when the other
>> domain is trying to link a new event.  The cmpxchg() in
>> evtchn_fifo_set_link() will repeatedly fail and the loop may never
>> terminate.
>>
>> Fixing this requires a minor change to the ABI, which is documented in
>> draft G of the design.
>>
>> http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf
>>
>> Since a well-behaved guest only makes a limited set of state changes,
>> the loop can terminate early if the guest makes an invalid state
>> transition.
>>
>> The guest may:
>>
>> - clear LINKED and link
>> - clear PENDING
>> - set MASKED
>> - clear MASKED
>>
>> It is valid for the guest to mask and unmask an event at any time so
>> we specify that it is not valid for a guest to clear MASKED if the
>> event is the tail of a queue (i.e., LINKED is set and LINK is clear).
>> Instead, the guest must make an EVCHNOP_unmask hypercall to unmask the
>> event.
> Given the non-obvious locking required for this to be safe and the
> overhead of the guest having to do a unmask hypercall more often.  I
> think I will fix this differently.
>
> A new BUSY bit is added to the event word.  Xen sets BUSY prior to
> updating the LINK field and then clears it when the LINK field is set.
>
> When the guest unmasks an event it must spin waiting for BUSY to clear
> before it clears the MASKED bit.  It then need only do the unmask
> hypercall if the event is pending (as before).
>
> Draft H and v3 of this series to follow.

Will there be a new Linux series as well?

-boris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-11-06 15:01     ` Boris Ostrovsky
@ 2013-11-06 15:07       ` David Vrabel
  0 siblings, 0 replies; 25+ messages in thread
From: David Vrabel @ 2013-11-06 15:07 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: Keir Fraser, David Vrabel, Jan Beulich, xen-devel

On 06/11/13 15:01, Boris Ostrovsky wrote:
> On 11/06/2013 08:38 AM, David Vrabel wrote:
>> 
>> Given the non-obvious locking required for this to be safe and the
>> overhead of the guest having to do a unmask hypercall more often.  I
>> think I will fix this differently.
>>
>> A new BUSY bit is added to the event word.  Xen sets BUSY prior to
>> updating the LINK field and then clears it when the LINK field is set.
>>
>> When the guest unmasks an event it must spin waiting for BUSY to clear
>> before it clears the MASKED bit.  It then need only do the unmask
>> hypercall if the event is pending (as before).
>>
>> Draft H and v3 of this series to follow.
> 
> Will there be a new Linux series as well?

Yes, but only the final patch in the series will change.

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
                     ` (4 preceding siblings ...)
  2013-11-06 13:38   ` David Vrabel
@ 2013-11-10 21:21   ` Matt Wilson
  5 siblings, 0 replies; 25+ messages in thread
From: Matt Wilson @ 2013-11-10 21:21 UTC (permalink / raw)
  To: David Vrabel
  Cc: Anthony Liguori, Keir Fraser, Anthony Liguori, Jan Beulich,
	xen-devel

On Thu, Oct 31, 2013 at 03:03:11PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> A malicious or buggy guest can cause another domain to spin
> indefinitely by repeatedly writing to an event word when the other
> domain is trying to link a new event.  The cmpxchg() in
> evtchn_fifo_set_link() will repeatedly fail and the loop may never
> terminate.
> 
> Fixing this requires a minor change to the ABI, which is documented in
> draft G of the design.
> 
> http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf
> 
> Since a well-behaved guest only makes a limited set of state changes,
> the loop can terminate early if the guest makes an invalid state
> transition.
> 
> The guest may:
> 
> - clear LINKED and link
> - clear PENDING
> - set MASKED
> - clear MASKED
> 
> It is valid for the guest to mask and unmask an event at any time so
> we specify that it is not valid for a guest to clear MASKED if the
> event is the tail of a queue (i.e., LINKED is set and LINK is clear).
> Instead, the guest must make an EVCHNOP_unmask hypercall to unmask the
> event.
> 
> The hypercall ensures that UNMASKED isn't cleared on a tail event
> whose LINK field is being set by holding the appropriate queue lock.
> 
> The remaining valid writes (clear LINKED, clear PENDING, set MASKED)
> will limit the number of failures of the cmpxchg() to at most 3.  A
> clear of LINKED will also terminate the loop early (as before).
> Therefore, the loop can then be limited to at most 3 iterations.
> 
> If the buggy or malicious guest does cause the loop to exit early, the
> newly pending event will be unreachable by the guest and it and
> subsequent events may be lost.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Wasn't this

Reported-by: Anthony Liguori <aliguori@amazon.com>

at Xen Developer Summit?

--msw

> ---
>  xen/common/event_fifo.c |   19 +++++++++++++++++--
>  xen/include/xen/sched.h |    3 +++
>  2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
> index 64dbfff..6cf30ec 100644
> --- a/xen/common/event_fifo.c
> +++ b/xen/common/event_fifo.c
> @@ -37,6 +37,7 @@ static inline event_word_t *evtchn_fifo_word_from_port(struct domain *d,
>  static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
>  {
>      event_word_t n, o, w;
> +    unsigned int try = 3;
>  
>      w = *word;
>  
> @@ -45,7 +46,8 @@ static bool_t evtchn_fifo_set_link(event_word_t *word, uint32_t link)
>              return 0;
>          o = w;
>          n = (w & ~EVTCHN_FIFO_LINK_MASK) | link;
> -    } while ( (w = cmpxchg(word, o, n)) != o );
> +        w = cmpxchg(word, o, n);
> +    } while ( w != o && try-- );
>  
>      return 1;
>  }
> @@ -90,6 +92,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn)
>          event_word_t *tail_word;
>          bool_t linked = 0;
>  
> +        evtchn->q = q;
> +
>          spin_lock_irqsave(&q->lock, flags);
>  
>          /*
> @@ -148,7 +152,18 @@ static void evtchn_fifo_unmask(struct domain *d, struct evtchn *evtchn)
>      if ( unlikely(!word) )
>          return;
>  
> -    clear_bit(EVTCHN_FIFO_MASKED, word);
> +    /*
> +     * If this event is on the tail of a queue, the queue lock must be
> +     * held to avoid clearing MASKED while setting LINK.
> +     */
> +    if ( evtchn->q && evtchn->q->tail == evtchn->port )
> +    {
> +        spin_lock_irq(&evtchn->q->lock);
> +        clear_bit(EVTCHN_FIFO_MASKED, word);
> +        spin_unlock_irq(&evtchn->q->lock);
> +    }
> +    else
> +        clear_bit(EVTCHN_FIFO_MASKED, word);
>  
>      /* Relink if pending. */
>      if ( test_bit(EVTCHN_FIFO_PENDING, word) )
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 624ea15..7b0273a 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -68,6 +68,8 @@ extern struct domain *dom0;
>  #define EVTCHNS_PER_GROUP  (BUCKETS_PER_GROUP * EVTCHNS_PER_BUCKET)
>  #define NR_EVTCHN_GROUPS   DIV_ROUND_UP(MAX_NR_EVTCHNS, EVTCHNS_PER_GROUP)
>  
> +struct evtchn_fifo_queue;
> +
>  struct evtchn
>  {
>  #define ECS_FREE         0 /* Channel is available for use.                  */
> @@ -98,6 +100,7 @@ struct evtchn
>      } u;
>      u8 priority;
>      u8 pending:1;
> +    struct evtchn_fifo_queue *q; /* Latest queue this event was on. */
>  #ifdef FLASK_ENABLE
>      void *ssid;
>  #endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes
  2013-10-31 15:03 [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel
                   ` (2 preceding siblings ...)
  2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
@ 2013-10-31 15:13 ` David Vrabel
  3 siblings, 0 replies; 25+ messages in thread
From: David Vrabel @ 2013-10-31 15:13 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Keir Fraser, Jan Beulich

Sorry Jan, wrong email address.

On 31/10/2013 15:03, David Vrabel wrote:
> This series addresses two design flaws in the FIFO-based event channel
> ABI.
> 
> 1. Allow guests to expand the array after binding events.
> 
> 2. Fix a potential DoS caused by an unbounded loop when setting LINK.
> 
> An updated design document is available from:
> 
> http://xenbits.xen.org/people/dvrabel/event-channels-G.pdf
> 
> - Add section on memory barriers.
> - EVTCHNOP expand array should be called after binding events.
> - Guest clearing MASKED on a tail event is no longer valid, a
>   hypercall is required. This allows the loop in the Xen's link()
>   function to be bounded.
> 
> v8 of the Linux patches will be posted shortly.
> 
> Changes since v1:
> - Add MAINTAINERS patch
> - Remove some unnecessary temporary pending state clears
> - Add fix for DoS
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-11-10 21:21 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-31 15:03 [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel
2013-10-31 15:03 ` [PATCH 1/3] MAINTAINERS: Add FIFO-based event channel ABI maintainer David Vrabel
2013-11-04 14:29   ` Jan Beulich
2013-11-05 21:06     ` Keir Fraser
2013-11-06 11:49       ` David Vrabel
2013-11-06 12:40         ` Jan Beulich
2013-10-31 15:03 ` [PATCH 2/3] evtchn: don't lose pending state if FIFO event array page is missing David Vrabel
2013-11-04 14:29   ` Jan Beulich
2013-11-05 21:07     ` Keir Fraser
2013-10-31 15:03 ` [PATCH 3/3] evtchn/fifo: don't spin indefinitely when setting LINK David Vrabel
2013-10-31 18:13   ` Boris Ostrovsky
2013-11-04 14:39   ` Jan Beulich
2013-11-04 14:52     ` David Vrabel
2013-11-04 14:57       ` Jan Beulich
2013-11-04 16:30         ` David Vrabel
2013-11-05 14:18           ` Jan Beulich
2013-11-04 15:07       ` Ian Campbell
2013-11-04 15:11         ` David Vrabel
2013-11-05 14:19   ` Jan Beulich
2013-11-05 14:25   ` Jan Beulich
2013-11-06 13:38   ` David Vrabel
2013-11-06 15:01     ` Boris Ostrovsky
2013-11-06 15:07       ` David Vrabel
2013-11-10 21:21   ` Matt Wilson
2013-10-31 15:13 ` [PATCHv2 0/3] Xen: FIFO-based event channel ABI fixes David Vrabel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).