* [PATCHv5 0/2] Xen: FIFO-based event channel ABI fixes @ 2013-11-19 18:17 David Vrabel 2013-11-19 18:17 ` [PATCH 1/2] evtchn/fifo: only set READY for new heads David Vrabel 2013-11-19 18:17 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel 0 siblings, 2 replies; 10+ messages in thread From: David Vrabel @ 2013-11-19 18:17 UTC (permalink / raw) To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich Two further fixes for bugs in the FIFO-based event channel implementation. 1. Correctly set READY bits to avoid races with guests emptying queues. 2. Handle problems with relinking events that were the tails on now empty queues. This fixes more cases than a previously posted patch. I have now backported the FIFO-based event channel patches (including the above fixes) to Xen 4.3 and Linux 3.10 and run then through some of XenServer's system tests. Approximately 40-50 machine hours of testing of dom0 have been done with a variety of guests and workloads. No event channel failures were found. Changes in v5: - Only set READY bits for new heads. - Rework old tail bug fix to cover all cases. Changes in v4: - const struct domain * - Clear BUSY with existing cmpxchg() where possible. - Fix BUSY bit debug output. Changes in v3: - Use a new BUSY bit to block guests from clearing UNMASKED, this is lower overhead than the previous solution (which required a hypercall). - Fix another problem with moving events between queues. - Add evtchn->last_vpcu_id and evtchn->last_priority instead of evtchn->q. This keeps the structure at 32 bytes long. Changes in v2: - Add MAINTAINERS patch - Remove some unnecessary temporary pending state clears - Add fix for DoS David ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] evtchn/fifo: only set READY for new heads 2013-11-19 18:17 [PATCHv5 0/2] Xen: FIFO-based event channel ABI fixes David Vrabel @ 2013-11-19 18:17 ` David Vrabel 2013-11-19 18:17 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel 1 sibling, 0 replies; 10+ messages in thread From: David Vrabel @ 2013-11-19 18:17 UTC (permalink / raw) To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich From: David Vrabel <david.vrabel@citrix.com> Setting a queue's READY bit for every event added to the queue introduces a race. If an event is added to the tail of a queue, the guest may consume the newly added event and leave an empty queue before the READY is set. The guest may then see a stale HEAD value and if the event at the stale head became linked onto a different queue, the guest would consume events from the wrong queue (corrupting it). As noted in section 4.1.2 of the design document, only set READY if a new HEAD is set. This ensures that if the guest sees a READY bit set the corresponding HEAD is valid. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_fifo.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index 9106c55..6048784 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -161,8 +161,9 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) spin_unlock_irqrestore(&q->lock, flags); - if ( !test_and_set_bit(q->priority, - &v->evtchn_fifo->control_block->ready) ) + if ( !linked + && !test_and_set_bit(q->priority, + &v->evtchn_fifo->control_block->ready) ) vcpu_mark_events_pending(v); } -- 1.7.2.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-11-19 18:17 [PATCHv5 0/2] Xen: FIFO-based event channel ABI fixes David Vrabel 2013-11-19 18:17 ` [PATCH 1/2] evtchn/fifo: only set READY for new heads David Vrabel @ 2013-11-19 18:17 ` David Vrabel 2013-11-20 17:21 ` David Vrabel 1 sibling, 1 reply; 10+ messages in thread From: David Vrabel @ 2013-11-19 18:17 UTC (permalink / raw) To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich From: David Vrabel <david.vrabel@citrix.com> An event may still be the tail of a queue even if the queue is now empty (an 'old tail' event). There is logic to handle the case when this old tail event needs to be added to the now empty queue (by checking for q->tail == port). However, this does not cover all cases. 1. An old tail may be re-added simultaneously with another event. LINKED is set on the old tail, and the other CPU may misinterpret this as the old tail still being valid and set LINK instead of HEAD. All events on this queue will then be lost. 2. If the old tail event on queue A is moved to a different queue B (by changing its VCPU or priority), the event may then be linked onto queue B. When another event is linked onto queue A it will check the old tail, see that it is linked (but on queue B) and overwrite the LINK field, corrupting both queues. When an event is linked, save the vcpu id and priority of the queue it is being linked onto. Use this when linking an event to check if it is an unlinked old tail event. If it is an old tail event, the old queue is empty and old_q->tail is invalidated to ensure adding another event to old_q will update HEAD. The tail is invalidated by setting it to 0 since the event 0 is never linked. The old_q->lock is held while setting LINKED to avoid the race with the test of LINKED in evtchn_fifo_set_link(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_fifo.c | 46 +++++++++++++++++++++++++++++++++++----------- xen/include/xen/sched.h | 2 ++ 2 files changed, 37 insertions(+), 11 deletions(-) diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index 6048784..7a6e360 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -103,7 +103,6 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) struct domain *d = v->domain; unsigned int port; event_word_t *word; - struct evtchn_fifo_queue *q; unsigned long flags; bool_t was_pending; @@ -120,25 +119,50 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) return; } - /* - * No locking around getting the queue. This may race with - * changing the priority but we are allowed to signal the event - * once on the old priority. - */ - q = &v->evtchn_fifo->queue[evtchn->priority]; - was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word); /* * Link the event if it unmasked and not already linked. */ if ( !test_bit(EVTCHN_FIFO_MASKED, word) - && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + && !test_bit(EVTCHN_FIFO_LINKED, word) ) { + struct vcpu *old_v; + struct evtchn_fifo_queue *q, *old_q; event_word_t *tail_word; bool_t linked = 0; - spin_lock_irqsave(&q->lock, flags); + /* + * No locking around getting the queue. This may race with + * changing the priority but we are allowed to signal the + * event once on the old priority. + */ + q = &v->evtchn_fifo->queue[evtchn->priority]; + + old_v = d->vcpu[evtchn->last_vcpu_id]; + old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority]; + + spin_lock_irqsave(&old_q->lock, flags); + + /* + * If this event was a tail, the old queue is now empty and + * its tail must be invalidated to prevent adding an event to + * the old queue from corrupting the new queue. + */ + if ( old_q->tail == port ) + old_q->tail = 0; + + set_bit(EVTCHN_FIFO_LINKED, word); + + /* Moved to a different queue? */ + if ( old_q != q) + { + evtchn->last_vcpu_id = evtchn->notify_vcpu_id; + evtchn->last_priority = evtchn->priority; + + spin_unlock_irqrestore(&old_q->lock, flags); + spin_lock_irqsave(&q->lock, flags); + } /* * Atomically link the tail to port iff the tail is linked. @@ -150,7 +174,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) * If the queue is empty (i.e., we haven't linked to the new * event), head must be updated. */ - if ( port != q->tail ) + if ( q->tail ) { tail_word = evtchn_fifo_word_from_port(d, q->tail); linked = evtchn_fifo_set_link(d, tail_word, port); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index cbdf377..5ab92dd 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -98,6 +98,8 @@ struct evtchn } u; u8 priority; u8 pending:1; + u16 last_vcpu_id; + u8 last_priority; #ifdef FLASK_ENABLE void *ssid; #endif -- 1.7.2.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-11-19 18:17 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel @ 2013-11-20 17:21 ` David Vrabel 2013-11-22 12:02 ` Jan Beulich 0 siblings, 1 reply; 10+ messages in thread From: David Vrabel @ 2013-11-20 17:21 UTC (permalink / raw) To: David Vrabel; +Cc: Keir Fraser, Jan Beulich, xen-devel On 19/11/13 18:17, David Vrabel wrote: > From: David Vrabel <david.vrabel@citrix.com> > > An event may still be the tail of a queue even if the queue is now > empty (an 'old tail' event). There is logic to handle the case when > this old tail event needs to be added to the now empty queue (by > checking for q->tail == port). > > However, this does not cover all cases. > > 1. An old tail may be re-added simultaneously with another event. > LINKED is set on the old tail, and the other CPU may misinterpret > this as the old tail still being valid and set LINK instead of > HEAD. All events on this queue will then be lost. > > 2. If the old tail event on queue A is moved to a different queue B > (by changing its VCPU or priority), the event may then be linked > onto queue B. When another event is linked onto queue A it will > check the old tail, see that it is linked (but on queue B) and > overwrite the LINK field, corrupting both queues. > > When an event is linked, save the vcpu id and priority of the queue it > is being linked onto. Use this when linking an event to check if it > is an unlinked old tail event. If it is an old tail event, the old > queue is empty and old_q->tail is invalidated to ensure adding another > event to old_q will update HEAD. The tail is invalidated by setting > it to 0 since the event 0 is never linked. > > The old_q->lock is held while setting LINKED to avoid the race with > the test of LINKED in evtchn_fifo_set_link(). Whilst this is considerably more reliable than before it isn't completely safe. evtchn_fifo_set_pending() is relying on being serialized for each event channel. The serialization is required to protect evtchn->last_* and the split test_bit(LINKED) and set_bit(LINKED). In most cases the serialization is done by a suitable lock in event_channel.c. e.g., interdomain event are serialized with the remote domain's event lock, virqs by the local virq lock. However, If evtchn_fifo_set_pending() is called from evtchn_fifo_unmask() or evtchn_fifo_expand_array() then only the local event lock is held which is different than the lock used for interdomain and virq events. One race is: CPU A CPU B EVTCHNOP_send() evtchn_fifo_set_pending() evtchn->last_vcpu = ... old_v = d->vcpu[evtchn->last_vcpu_id] old_q = old_v->queue[evtchn->last_pri] evtchn->last_pri = ... lock(old_q) // wrong q set_bit(LINKED) q->tail = port ... Guest: clear_bit(LINKED) set_bit(LINKED) // q->tail not invalidated unlock(old_q) lock(q) link port to itself. unlock(q) Linking an event to itself will lose any other event that were in the queue (they're LINKED but not reachable by the guest) Another race is: CPU A CPU B EVTCHNOP_send() evtchn_fifo_set_pending() clear_bit(MASKED) set_bit(PENDING) test_bit(PENDING) test_bit(LINKED) EVTCHNOP_unmask() evtchn_fifo_set_pending() test_bit(LINKED) lock(q->lock) set_bit(LINKED) link to q->tail q->tail = port unlock(q->lock) lock(q->lock) q->tail = 0 because q->tail == port q->head = port unlock(q->lock) The double link of the event loses any other event preceding it in the queue and those events are lost (they have LINKED set but are no longer reachable by the guest). There are two ways to fix this: 1. Introduce a per-event lock and use this to serialize evtchn_fifo_set_pending(). After 4.4, this per-event lock can be used instead of the domain's event lock and virq lock when raising events, hopefully reducing lock contention and improving event channel scalability. This would half the number of struct evtchn per page as the lock takes it over 32 bytes in size. 2. Check for the old_q changing after locking old_q->lock and use test_and_set_bit(LINKED) to bail early if another CPU linked it (see below patch). Any opinions on either of these solutions? --- a/xen/common/event_fifo.c Tue Nov 19 11:06:54 2013 +0000 +++ b/xen/common/event_fifo.c Wed Nov 20 16:41:32 2013 +0000 @@ -34,6 +34,30 @@ static inline event_word_t *evtchn_fifo_ return d->evtchn_fifo->event_array[p] + w; } +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, + struct evtchn *evtchn, + unsigned long *flags) +{ + struct vcpu *v; + struct evtchn_fifo_queue *q, *old_q; + + for (;;) + { + v = d->vcpu[evtchn->last_vcpu_id]; + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + spin_lock_irqsave(&old_q->lock, *flags); + + v = d->vcpu[evtchn->last_vcpu_id]; + q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + if ( old_q == q ) + return old_q; + + spin_unlock_irqrestore(&old_q->lock, *flags); + } +} + static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) { event_word_t new, old; @@ -127,7 +151,6 @@ static void evtchn_fifo_set_pending(stru if ( !test_bit(EVTCHN_FIFO_MASKED, word) && !test_bit(EVTCHN_FIFO_LINKED, word) ) { - struct vcpu *old_v; struct evtchn_fifo_queue *q, *old_q; event_word_t *tail_word; bool_t linked = 0; @@ -139,10 +162,13 @@ static void evtchn_fifo_set_pending(stru */ q = &v->evtchn_fifo->queue[evtchn->priority]; - old_v = d->vcpu[evtchn->last_vcpu_id]; - old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority]; + old_q = lock_old_queue(d, evtchn, &flags); - spin_lock_irqsave(&old_q->lock, flags); + if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + { + spin_unlock_irqrestore(&old_q->lock, flags); + goto done; + } /* * If this event was a tail, the old queue is now empty and @@ -152,12 +178,9 @@ static void evtchn_fifo_set_pending(stru if ( old_q->tail == port ) old_q->tail = 0; - set_bit(EVTCHN_FIFO_LINKED, word); - /* Moved to a different queue? */ - if ( old_q != q) + if ( old_q != q ) { - evtchn->last_vcpu_id = evtchn->notify_vcpu_id; evtchn->last_priority = evtchn->priority; @@ -191,7 +214,7 @@ static void evtchn_fifo_set_pending(stru &v->evtchn_fifo->control_block->ready) ) vcpu_mark_events_pending(v); } - +done: if ( !was_pending ) evtchn_check_pollers(d, port); } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-11-20 17:21 ` David Vrabel @ 2013-11-22 12:02 ` Jan Beulich 2013-11-22 18:23 ` David Vrabel 0 siblings, 1 reply; 10+ messages in thread From: Jan Beulich @ 2013-11-22 12:02 UTC (permalink / raw) To: David Vrabel; +Cc: Keir Fraser, xen-devel >>> On 20.11.13 at 18:21, David Vrabel <david.vrabel@citrix.com> wrote: > 2. Check for the old_q changing after locking old_q->lock and use > test_and_set_bit(LINKED) to bail early if another CPU linked it (see > below patch). > > Any opinions on either of these solutions? I'd favor 2, but ... > --- a/xen/common/event_fifo.c Tue Nov 19 11:06:54 2013 +0000 > +++ b/xen/common/event_fifo.c Wed Nov 20 16:41:32 2013 +0000 > @@ -34,6 +34,30 @@ static inline event_word_t *evtchn_fifo_ > return d->evtchn_fifo->event_array[p] + w; > } > > +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, > + struct evtchn *evtchn, > + unsigned long *flags) > +{ > + struct vcpu *v; > + struct evtchn_fifo_queue *q, *old_q; > + > + for (;;) > + { > + v = d->vcpu[evtchn->last_vcpu_id]; > + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; > + > + spin_lock_irqsave(&old_q->lock, *flags); > + > + v = d->vcpu[evtchn->last_vcpu_id]; > + q = &v->evtchn_fifo->queue[evtchn->last_priority]; > + > + if ( old_q == q ) > + return old_q; > + > + spin_unlock_irqrestore(&old_q->lock, *flags); > + } ... is there a guaranteed upper bound to this loop? > +} > + > static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) > { > event_word_t new, old; > @@ -127,7 +151,6 @@ static void evtchn_fifo_set_pending(stru > if ( !test_bit(EVTCHN_FIFO_MASKED, word) > && !test_bit(EVTCHN_FIFO_LINKED, word) ) > { > - struct vcpu *old_v; > struct evtchn_fifo_queue *q, *old_q; > event_word_t *tail_word; > bool_t linked = 0; > @@ -139,10 +162,13 @@ static void evtchn_fifo_set_pending(stru > */ > q = &v->evtchn_fifo->queue[evtchn->priority]; > > - old_v = d->vcpu[evtchn->last_vcpu_id]; > - old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority]; > + old_q = lock_old_queue(d, evtchn, &flags); > > - spin_lock_irqsave(&old_q->lock, flags); > + if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) > + { > + spin_unlock_irqrestore(&old_q->lock, flags); > + goto done; > + } > > /* > * If this event was a tail, the old queue is now empty and > @@ -152,12 +178,9 @@ static void evtchn_fifo_set_pending(stru > if ( old_q->tail == port ) > old_q->tail = 0; > > - set_bit(EVTCHN_FIFO_LINKED, word); > - > /* Moved to a different queue? */ > - if ( old_q != q) > + if ( old_q != q ) > { > - > evtchn->last_vcpu_id = evtchn->notify_vcpu_id; > evtchn->last_priority = evtchn->priority; > > @@ -191,7 +214,7 @@ static void evtchn_fifo_set_pending(stru > &v->evtchn_fifo->control_block->ready) ) > vcpu_mark_events_pending(v); > } > - > +done: Labels indented by at least one space please. > if ( !was_pending ) > evtchn_check_pollers(d, port); > } Apart from that - what does this mean for the 2/2 patch you reply to here? Apply it or wait (I assume the latter)? If wait, is 1/2 still fine to apply? Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-11-22 12:02 ` Jan Beulich @ 2013-11-22 18:23 ` David Vrabel 2013-11-25 9:10 ` Jan Beulich 0 siblings, 1 reply; 10+ messages in thread From: David Vrabel @ 2013-11-22 18:23 UTC (permalink / raw) To: Jan Beulich; +Cc: Keir Fraser, xen-devel On 22/11/13 12:02, Jan Beulich wrote: >>>> On 20.11.13 at 18:21, David Vrabel <david.vrabel@citrix.com> wrote: >> 2. Check for the old_q changing after locking old_q->lock and use >> test_and_set_bit(LINKED) to bail early if another CPU linked it (see >> below patch). >> >> Any opinions on either of these solutions? > > I'd favor 2, but ... > >> --- a/xen/common/event_fifo.c Tue Nov 19 11:06:54 2013 +0000 >> +++ b/xen/common/event_fifo.c Wed Nov 20 16:41:32 2013 +0000 >> @@ -34,6 +34,30 @@ static inline event_word_t *evtchn_fifo_ >> return d->evtchn_fifo->event_array[p] + w; >> } >> >> +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, >> + struct evtchn *evtchn, >> + unsigned long *flags) >> +{ >> + struct vcpu *v; >> + struct evtchn_fifo_queue *q, *old_q; >> + >> + for (;;) >> + { >> + v = d->vcpu[evtchn->last_vcpu_id]; >> + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; >> + >> + spin_lock_irqsave(&old_q->lock, *flags); >> + >> + v = d->vcpu[evtchn->last_vcpu_id]; >> + q = &v->evtchn_fifo->queue[evtchn->last_priority]; >> + >> + if ( old_q == q ) >> + return old_q; >> + >> + spin_unlock_irqrestore(&old_q->lock, *flags); >> + } > > ... is there a guaranteed upper bound to this loop? No :( But the only attack I could think of seems highly implausible. A malicious guest A with two co-operating guests (B and C) can ping-pong one of its queue locks (Q) between two VCPUs with repeated EVTCHNOP_send calls on two interdomain event channels bound to A. They need to be in different domains otherwise there is a window were Q will not be locked. The time spent while holding Q is less than the time spent in the hypercall while not holding the lock, then the guest will need more co-operating guests to keep Q constantly locked. If Guest A then has another two co-operating guests (D and E), it can arrange for them to ping-pong another queue lock (R) between two VCPUs. Guest A can also repeatedly change the priority of these four events. With careful timing it will be able to change the priority such that every send call moves the event between the two queues. Guest A must also immediately clear any LINKED bit to prevent the unmask calls from taking the 'already LINKED' fast path in evtchn_fifo_set_pending(). This is trivial to do by just repeatedly writing 0 to the relevant event words. Guest V (the victim) then attempts to acquire the old queue lock Q. If it manages to lock it, it will now be the wrong lock and it must try and acquire R. If it manages to acquire R it will again be the wrong lock. And so on. There might be an easier attack but I couldn't see it. Do you think this is a real problem that should be resolved? > Apart from that - what does this mean for the 2/2 patch you reply > to here? Apply it or wait (I assume the latter)? If wait, is 1/2 still > fine to apply? Please apply 1/2 and wait for a revised 2/2. David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-11-22 18:23 ` David Vrabel @ 2013-11-25 9:10 ` Jan Beulich 0 siblings, 0 replies; 10+ messages in thread From: Jan Beulich @ 2013-11-25 9:10 UTC (permalink / raw) To: David Vrabel; +Cc: Keir Fraser, xen-devel >>> On 22.11.13 at 19:23, David Vrabel <david.vrabel@citrix.com> wrote: > On 22/11/13 12:02, Jan Beulich wrote: >>>>> On 20.11.13 at 18:21, David Vrabel <david.vrabel@citrix.com> wrote: >>> +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, >>> + struct evtchn *evtchn, >>> + unsigned long *flags) >>> +{ >>> + struct vcpu *v; >>> + struct evtchn_fifo_queue *q, *old_q; >>> + >>> + for (;;) >>> + { >>> + v = d->vcpu[evtchn->last_vcpu_id]; >>> + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; >>> + >>> + spin_lock_irqsave(&old_q->lock, *flags); >>> + >>> + v = d->vcpu[evtchn->last_vcpu_id]; >>> + q = &v->evtchn_fifo->queue[evtchn->last_priority]; >>> + >>> + if ( old_q == q ) >>> + return old_q; >>> + >>> + spin_unlock_irqrestore(&old_q->lock, *flags); >>> + } >> >> ... is there a guaranteed upper bound to this loop? > > No :( But the only attack I could think of seems highly implausible. > > A malicious guest A with two co-operating guests (B and C) can ping-pong > one of its queue locks (Q) between two VCPUs with repeated EVTCHNOP_send > calls on two interdomain event channels bound to A. They need to be in > different domains otherwise there is a window were Q will not be locked. > The time spent while holding Q is less than the time spent in the > hypercall while not holding the lock, then the guest will need more > co-operating guests to keep Q constantly locked. > > If Guest A then has another two co-operating guests (D and E), it can > arrange for them to ping-pong another queue lock (R) between two VCPUs. > > Guest A can also repeatedly change the priority of these four events. > With careful timing it will be able to change the priority such that > every send call moves the event between the two queues. > > Guest A must also immediately clear any LINKED bit to prevent the unmask > calls from taking the 'already LINKED' fast path in > evtchn_fifo_set_pending(). This is trivial to do by just repeatedly > writing 0 to the relevant event words. > > Guest V (the victim) then attempts to acquire the old queue lock Q. If > it manages to lock it, it will now be the wrong lock and it must try and > acquire R. If it manages to acquire R it will again be the wrong lock. > And so on. > > There might be an easier attack but I couldn't see it. > > Do you think this is a real problem that should be resolved? I think at least some arbitrary upper bound on the number of iterations should be put in, either failing the operation or at least logging a message when exceeding the bound. >> Apart from that - what does this mean for the 2/2 patch you reply >> to here? Apply it or wait (I assume the latter)? If wait, is 1/2 still >> fine to apply? > > Please apply 1/2 and wait for a revised 2/2. Will do. Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv7 0/2] Xen: FIFO-based event channel fixes @ 2013-12-10 13:56 David Vrabel 2013-12-10 13:57 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel 0 siblings, 1 reply; 10+ messages in thread From: David Vrabel @ 2013-12-10 13:56 UTC (permalink / raw) To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich This series fixes two bugs in the FIFO-based event channel ABI. With the first bug, the priority of a newly bound event may not be the default (it might have been an old priority for that port or 0). The second bug is triggered by moving events between queues (either moving VCPUs or changing their priority). This would cause events to be lost. Testing with a process continually moving all event channels between VCPUs has been done. This would previously fail in under an hour but with this fix the system stayed up for over 10 days. It has also been through a complete set of XenServer's automated regression tests and no issues were found. Changes in v7: - Add patch to initialize priority of all newly bound events. Changes in v6: - Limit loop to acquire old_q->lock to 3 iterations. Changes in v5: - Only set READY bits for new heads. - Rework old tail bug fix to cover all cases. Changes in v4: - const struct domain * - Clear BUSY with existing cmpxchg() where possible. - Fix BUSY bit debug output. Changes in v3: - Use a new BUSY bit to block guests from clearing UNMASKED, this is lower overhead than the previous solution (which required a hypercall). - Fix another problem with moving events between queues. - Add evtchn->last_vpcu_id and evtchn->last_priority instead of evtchn->q. This keeps the structure at 32 bytes long. Changes in v2: - Add MAINTAINERS patch - Remove some unnecessary temporary pending state clears - Add fix for DoS David ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-12-10 13:56 [PATCHv7 0/2] Xen: FIFO-based event channel fixes David Vrabel @ 2013-12-10 13:57 ` David Vrabel 2013-12-10 14:55 ` Jan Beulich 2014-01-07 15:50 ` Keir Fraser 0 siblings, 2 replies; 10+ messages in thread From: David Vrabel @ 2013-12-10 13:57 UTC (permalink / raw) To: xen-devel; +Cc: Keir Fraser, David Vrabel, Jan Beulich From: David Vrabel <david.vrabel@citrix.com> An event may still be the tail of a queue even if the queue is now empty (an 'old tail' event). There is logic to handle the case when this old tail event needs to be added to the now empty queue (by checking for q->tail == port). However, this does not cover all cases. 1. An old tail may be re-added simultaneously with another event. LINKED is set on the old tail, and the other CPU may misinterpret this as the old tail still being valid and set LINK instead of HEAD. All events on this queue will then be lost. 2. If the old tail event on queue A is moved to a different queue B (by changing its VCPU or priority), the event may then be linked onto queue B. When another event is linked onto queue A it will check the old tail, see that it is linked (but on queue B) and overwrite the LINK field, corrupting both queues. When an event is linked, save the vcpu id and priority of the queue it is being linked onto. Use this when linking an event to check if it is an unlinked old tail event. If it is an old tail event, the old queue is empty and old_q->tail is invalidated to ensure adding another event to old_q will update HEAD. The tail is invalidated by setting it to 0 since the event 0 is never linked. The old_q->lock is held while setting LINKED to avoid the race with the test of LINKED in evtchn_fifo_set_link(). Since a event channel may move queues after old_q->lock is acquired, we must check that we have the correct lock and retry if not. Since changing VCPUs or priority is expected to be rare events that are serialized in the guest, we try at most 3 times before dropping the event. This prevents a malicious guest from repeatedly adjusting priority to prevent another domain from acquiring old_q->lock. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- xen/common/event_fifo.c | 80 ++++++++++++++++++++++++++++++++++++++++------- xen/include/xen/sched.h | 2 + 2 files changed, 70 insertions(+), 12 deletions(-) diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index 2ab4c29..fc43e62 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -50,6 +50,36 @@ static void evtchn_fifo_init(struct domain *d, struct evtchn *evtchn) d->domain_id, evtchn->port); } +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, + struct evtchn *evtchn, + unsigned long *flags) +{ + struct vcpu *v; + struct evtchn_fifo_queue *q, *old_q; + unsigned int try; + + for ( try = 0; try < 3; try++ ) + { + v = d->vcpu[evtchn->last_vcpu_id]; + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + spin_lock_irqsave(&old_q->lock, *flags); + + v = d->vcpu[evtchn->last_vcpu_id]; + q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + if ( old_q == q ) + return old_q; + + spin_unlock_irqrestore(&old_q->lock, *flags); + } + + gdprintk(XENLOG_WARNING, + "domain %d, port %d lost event (too many queue changes)\n", + d->domain_id, evtchn->port); + return NULL; +} + static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) { event_word_t new, old; @@ -119,7 +149,6 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) struct domain *d = v->domain; unsigned int port; event_word_t *word; - struct evtchn_fifo_queue *q; unsigned long flags; bool_t was_pending; @@ -136,25 +165,52 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) return; } - /* - * No locking around getting the queue. This may race with - * changing the priority but we are allowed to signal the event - * once on the old priority. - */ - q = &v->evtchn_fifo->queue[evtchn->priority]; - was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word); /* * Link the event if it unmasked and not already linked. */ if ( !test_bit(EVTCHN_FIFO_MASKED, word) - && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + && !test_bit(EVTCHN_FIFO_LINKED, word) ) { + struct evtchn_fifo_queue *q, *old_q; event_word_t *tail_word; bool_t linked = 0; - spin_lock_irqsave(&q->lock, flags); + /* + * No locking around getting the queue. This may race with + * changing the priority but we are allowed to signal the + * event once on the old priority. + */ + q = &v->evtchn_fifo->queue[evtchn->priority]; + + old_q = lock_old_queue(d, evtchn, &flags); + if ( !old_q ) + goto done; + + if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + { + spin_unlock_irqrestore(&old_q->lock, flags); + goto done; + } + + /* + * If this event was a tail, the old queue is now empty and + * its tail must be invalidated to prevent adding an event to + * the old queue from corrupting the new queue. + */ + if ( old_q->tail == port ) + old_q->tail = 0; + + /* Moved to a different queue? */ + if ( old_q != q ) + { + evtchn->last_vcpu_id = evtchn->notify_vcpu_id; + evtchn->last_priority = evtchn->priority; + + spin_unlock_irqrestore(&old_q->lock, flags); + spin_lock_irqsave(&q->lock, flags); + } /* * Atomically link the tail to port iff the tail is linked. @@ -166,7 +222,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) * If the queue is empty (i.e., we haven't linked to the new * event), head must be updated. */ - if ( port != q->tail ) + if ( q->tail ) { tail_word = evtchn_fifo_word_from_port(d, q->tail); linked = evtchn_fifo_set_link(d, tail_word, port); @@ -182,7 +238,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) &v->evtchn_fifo->control_block->ready) ) vcpu_mark_events_pending(v); } - + done: if ( !was_pending ) evtchn_check_pollers(d, port); } diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index cbdf377..5ab92dd 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -98,6 +98,8 @@ struct evtchn } u; u8 priority; u8 pending:1; + u16 last_vcpu_id; + u8 last_priority; #ifdef FLASK_ENABLE void *ssid; #endif -- 1.7.2.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-12-10 13:57 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel @ 2013-12-10 14:55 ` Jan Beulich 2014-01-07 15:50 ` Keir Fraser 1 sibling, 0 replies; 10+ messages in thread From: Jan Beulich @ 2013-12-10 14:55 UTC (permalink / raw) To: David Vrabel; +Cc: xen-devel, Keir Fraser >>> On 10.12.13 at 14:57, David Vrabel <david.vrabel@citrix.com> wrote: > From: David Vrabel <david.vrabel@citrix.com> > > An event may still be the tail of a queue even if the queue is now > empty (an 'old tail' event). There is logic to handle the case when > this old tail event needs to be added to the now empty queue (by > checking for q->tail == port). > > However, this does not cover all cases. > > 1. An old tail may be re-added simultaneously with another event. > LINKED is set on the old tail, and the other CPU may misinterpret > this as the old tail still being valid and set LINK instead of > HEAD. All events on this queue will then be lost. > > 2. If the old tail event on queue A is moved to a different queue B > (by changing its VCPU or priority), the event may then be linked > onto queue B. When another event is linked onto queue A it will > check the old tail, see that it is linked (but on queue B) and > overwrite the LINK field, corrupting both queues. > > When an event is linked, save the vcpu id and priority of the queue it > is being linked onto. Use this when linking an event to check if it > is an unlinked old tail event. If it is an old tail event, the old > queue is empty and old_q->tail is invalidated to ensure adding another > event to old_q will update HEAD. The tail is invalidated by setting > it to 0 since the event 0 is never linked. > > The old_q->lock is held while setting LINKED to avoid the race with > the test of LINKED in evtchn_fifo_set_link(). > > Since a event channel may move queues after old_q->lock is acquired, > we must check that we have the correct lock and retry if not. Since > changing VCPUs or priority is expected to be rare events that are > serialized in the guest, we try at most 3 times before dropping the > event. This prevents a malicious guest from repeatedly adjusting > priority to prevent another domain from acquiring old_q->lock. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> > --- > xen/common/event_fifo.c | 80 ++++++++++++++++++++++++++++++++++++++++------- > xen/include/xen/sched.h | 2 + > 2 files changed, 70 insertions(+), 12 deletions(-) > > diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c > index 2ab4c29..fc43e62 100644 > --- a/xen/common/event_fifo.c > +++ b/xen/common/event_fifo.c > @@ -50,6 +50,36 @@ static void evtchn_fifo_init(struct domain *d, struct > evtchn *evtchn) > d->domain_id, evtchn->port); > } > > +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, > + struct evtchn *evtchn, > + unsigned long *flags) > +{ > + struct vcpu *v; > + struct evtchn_fifo_queue *q, *old_q; > + unsigned int try; > + > + for ( try = 0; try < 3; try++ ) > + { > + v = d->vcpu[evtchn->last_vcpu_id]; > + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; > + > + spin_lock_irqsave(&old_q->lock, *flags); > + > + v = d->vcpu[evtchn->last_vcpu_id]; > + q = &v->evtchn_fifo->queue[evtchn->last_priority]; > + > + if ( old_q == q ) > + return old_q; > + > + spin_unlock_irqrestore(&old_q->lock, *flags); > + } > + > + gdprintk(XENLOG_WARNING, > + "domain %d, port %d lost event (too many queue changes)\n", > + d->domain_id, evtchn->port); > + return NULL; > +} > + > static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) > { > event_word_t new, old; > @@ -119,7 +149,6 @@ static void evtchn_fifo_set_pending(struct vcpu *v, > struct evtchn *evtchn) > struct domain *d = v->domain; > unsigned int port; > event_word_t *word; > - struct evtchn_fifo_queue *q; > unsigned long flags; > bool_t was_pending; > > @@ -136,25 +165,52 @@ static void evtchn_fifo_set_pending(struct vcpu *v, > struct evtchn *evtchn) > return; > } > > - /* > - * No locking around getting the queue. This may race with > - * changing the priority but we are allowed to signal the event > - * once on the old priority. > - */ > - q = &v->evtchn_fifo->queue[evtchn->priority]; > - > was_pending = test_and_set_bit(EVTCHN_FIFO_PENDING, word); > > /* > * Link the event if it unmasked and not already linked. > */ > if ( !test_bit(EVTCHN_FIFO_MASKED, word) > - && !test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) > + && !test_bit(EVTCHN_FIFO_LINKED, word) ) > { > + struct evtchn_fifo_queue *q, *old_q; > event_word_t *tail_word; > bool_t linked = 0; > > - spin_lock_irqsave(&q->lock, flags); > + /* > + * No locking around getting the queue. This may race with > + * changing the priority but we are allowed to signal the > + * event once on the old priority. > + */ > + q = &v->evtchn_fifo->queue[evtchn->priority]; > + > + old_q = lock_old_queue(d, evtchn, &flags); > + if ( !old_q ) > + goto done; > + > + if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) > + { > + spin_unlock_irqrestore(&old_q->lock, flags); > + goto done; > + } > + > + /* > + * If this event was a tail, the old queue is now empty and > + * its tail must be invalidated to prevent adding an event to > + * the old queue from corrupting the new queue. > + */ > + if ( old_q->tail == port ) > + old_q->tail = 0; > + > + /* Moved to a different queue? */ > + if ( old_q != q ) > + { > + evtchn->last_vcpu_id = evtchn->notify_vcpu_id; > + evtchn->last_priority = evtchn->priority; > + > + spin_unlock_irqrestore(&old_q->lock, flags); > + spin_lock_irqsave(&q->lock, flags); > + } > > /* > * Atomically link the tail to port iff the tail is linked. > @@ -166,7 +222,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, > struct evtchn *evtchn) > * If the queue is empty (i.e., we haven't linked to the new > * event), head must be updated. > */ > - if ( port != q->tail ) > + if ( q->tail ) > { > tail_word = evtchn_fifo_word_from_port(d, q->tail); > linked = evtchn_fifo_set_link(d, tail_word, port); > @@ -182,7 +238,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, > struct evtchn *evtchn) > &v->evtchn_fifo->control_block->ready) ) > vcpu_mark_events_pending(v); > } > - > + done: > if ( !was_pending ) > evtchn_check_pollers(d, port); > } > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index cbdf377..5ab92dd 100644 > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -98,6 +98,8 @@ struct evtchn > } u; > u8 priority; > u8 pending:1; > + u16 last_vcpu_id; > + u8 last_priority; > #ifdef FLASK_ENABLE > void *ssid; > #endif > -- > 1.7.2.5 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked 2013-12-10 13:57 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel 2013-12-10 14:55 ` Jan Beulich @ 2014-01-07 15:50 ` Keir Fraser 1 sibling, 0 replies; 10+ messages in thread From: Keir Fraser @ 2014-01-07 15:50 UTC (permalink / raw) To: David Vrabel, xen-devel; +Cc: Jan Beulich On 10/12/2013 13:57, "David Vrabel" <david.vrabel@citrix.com> wrote: > From: David Vrabel <david.vrabel@citrix.com> > > An event may still be the tail of a queue even if the queue is now > empty (an 'old tail' event). There is logic to handle the case when > this old tail event needs to be added to the now empty queue (by > checking for q->tail == port). > > However, this does not cover all cases. > > 1. An old tail may be re-added simultaneously with another event. > LINKED is set on the old tail, and the other CPU may misinterpret > this as the old tail still being valid and set LINK instead of > HEAD. All events on this queue will then be lost. > > 2. If the old tail event on queue A is moved to a different queue B > (by changing its VCPU or priority), the event may then be linked > onto queue B. When another event is linked onto queue A it will > check the old tail, see that it is linked (but on queue B) and > overwrite the LINK field, corrupting both queues. > > When an event is linked, save the vcpu id and priority of the queue it > is being linked onto. Use this when linking an event to check if it > is an unlinked old tail event. If it is an old tail event, the old > queue is empty and old_q->tail is invalidated to ensure adding another > event to old_q will update HEAD. The tail is invalidated by setting > it to 0 since the event 0 is never linked. > > The old_q->lock is held while setting LINKED to avoid the race with > the test of LINKED in evtchn_fifo_set_link(). > > Since a event channel may move queues after old_q->lock is acquired, > we must check that we have the correct lock and retry if not. Since > changing VCPUs or priority is expected to be rare events that are > serialized in the guest, we try at most 3 times before dropping the > event. This prevents a malicious guest from repeatedly adjusting > priority to prevent another domain from acquiring old_q->lock. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > --- Acked-by: Keir Fraser <keir@xen.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-01-07 15:50 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-19 18:17 [PATCHv5 0/2] Xen: FIFO-based event channel ABI fixes David Vrabel 2013-11-19 18:17 ` [PATCH 1/2] evtchn/fifo: only set READY for new heads David Vrabel 2013-11-19 18:17 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel 2013-11-20 17:21 ` David Vrabel 2013-11-22 12:02 ` Jan Beulich 2013-11-22 18:23 ` David Vrabel 2013-11-25 9:10 ` Jan Beulich -- strict thread matches above, loose matches on Subject: below -- 2013-12-10 13:56 [PATCHv7 0/2] Xen: FIFO-based event channel fixes David Vrabel 2013-12-10 13:57 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel 2013-12-10 14:55 ` Jan Beulich 2014-01-07 15:50 ` Keir Fraser
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).