xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: David Vrabel <david.vrabel@citrix.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <jbeulich@suse.com>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked
Date: Wed, 20 Nov 2013 17:21:16 +0000	[thread overview]
Message-ID: <528CEF8C.904@citrix.com> (raw)
In-Reply-To: <1384885023-11565-3-git-send-email-david.vrabel@citrix.com>

On 19/11/13 18:17, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> An event may still be the tail of a queue even if the queue is now
> empty (an 'old tail' event).  There is logic to handle the case when
> this old tail event needs to be added to the now empty queue (by
> checking for q->tail == port).
> 
> However, this does not cover all cases.
> 
> 1. An old tail may be re-added simultaneously with another event.
>    LINKED is set on the old tail, and the other CPU may misinterpret
>    this as the old tail still being valid and set LINK instead of
>    HEAD.  All events on this queue will then be lost.
> 
> 2. If the old tail event on queue A is moved to a different queue B
>    (by changing its VCPU or priority), the event may then be linked
>    onto queue B.  When another event is linked onto queue A it will
>    check the old tail, see that it is linked (but on queue B) and
>    overwrite the LINK field, corrupting both queues.
> 
> When an event is linked, save the vcpu id and priority of the queue it
> is being linked onto.  Use this when linking an event to check if it
> is an unlinked old tail event.  If it is an old tail event, the old
> queue is empty and old_q->tail is invalidated to ensure adding another
> event to old_q will update HEAD.  The tail is invalidated by setting
> it to 0 since the event 0 is never linked.
> 
> The old_q->lock is held while setting LINKED to avoid the race with
> the test of LINKED in evtchn_fifo_set_link().

Whilst this is considerably more reliable than before it isn't completely
safe.

evtchn_fifo_set_pending() is relying on being serialized for each event
channel. The serialization is required to protect evtchn->last_* and the
split test_bit(LINKED) and set_bit(LINKED).

In most cases the serialization is done by a suitable lock in
event_channel.c. e.g., interdomain event are serialized with the remote
domain's event lock, virqs by the local virq lock.

However, If evtchn_fifo_set_pending() is called from evtchn_fifo_unmask()
or evtchn_fifo_expand_array() then only the local event lock is held
which is different than the lock used for interdomain and virq events.

One race is:

  CPU A                         CPU B

  EVTCHNOP_send()
    evtchn_fifo_set_pending()
      evtchn->last_vcpu = ...
                                old_v = d->vcpu[evtchn->last_vcpu_id]
                                old_q = old_v->queue[evtchn->last_pri]
      evtchn->last_pri = ...
                                lock(old_q) // wrong q
      set_bit(LINKED)
      q->tail = port
      ...
  Guest:
     clear_bit(LINKED)
                                set_bit(LINKED)
                                // q->tail not invalidated
                                unlock(old_q)
                                lock(q)
                                link port to itself.
                                unlock(q)

Linking an event to itself will lose any other event that were in the
queue (they're LINKED but not reachable by the guest)

Another race is:

  CPU A                           CPU B

  EVTCHNOP_send()                 
     evtchn_fifo_set_pending()    clear_bit(MASKED)
       set_bit(PENDING)           test_bit(PENDING)
       test_bit(LINKED)           EVTCHNOP_unmask()
                                  evtchn_fifo_set_pending()
                                    test_bit(LINKED)
       lock(q->lock)
       set_bit(LINKED)
       link to q->tail
       q->tail = port
       unlock(q->lock)
                                    lock(q->lock)
                                    q->tail = 0 because q->tail == port
                                    q->head = port
                                    unlock(q->lock)

The double link of the event loses any other event preceding it in
the queue and those events are lost (they have LINKED set but are
no longer reachable by the guest).

There are two ways to fix this:

1. Introduce a per-event lock and use this to serialize
   evtchn_fifo_set_pending().  After 4.4, this per-event lock can be
   used instead of the domain's event lock and virq lock when raising
   events, hopefully reducing lock contention and improving event
   channel scalability.

   This would half the number of struct evtchn per page as the lock
   takes it over 32 bytes in size.

2. Check for the old_q changing after locking old_q->lock and use
   test_and_set_bit(LINKED) to bail early if another CPU linked it (see
   below patch).

Any opinions on either of these solutions?

--- a/xen/common/event_fifo.c	Tue Nov 19 11:06:54 2013 +0000
+++ b/xen/common/event_fifo.c	Wed Nov 20 16:41:32 2013 +0000
@@ -34,6 +34,30 @@ static inline event_word_t *evtchn_fifo_
     return d->evtchn_fifo->event_array[p] + w;
 }
 
+static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
+                                                struct evtchn *evtchn,
+                                                unsigned long *flags)
+{
+    struct vcpu *v;
+    struct evtchn_fifo_queue *q, *old_q;
+
+    for (;;)
+    {
+        v = d->vcpu[evtchn->last_vcpu_id];
+        old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
+
+        spin_lock_irqsave(&old_q->lock, *flags);
+
+        v = d->vcpu[evtchn->last_vcpu_id];
+        q = &v->evtchn_fifo->queue[evtchn->last_priority];
+
+        if ( old_q == q )
+            return old_q;
+
+        spin_unlock_irqrestore(&old_q->lock, *flags);
+    }
+}          
+
 static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link)
 {
     event_word_t new, old;
@@ -127,7 +151,6 @@ static void evtchn_fifo_set_pending(stru
     if ( !test_bit(EVTCHN_FIFO_MASKED, word)
          && !test_bit(EVTCHN_FIFO_LINKED, word) )
     {
-        struct vcpu *old_v;
         struct evtchn_fifo_queue *q, *old_q;
         event_word_t *tail_word;
         bool_t linked = 0;
@@ -139,10 +162,13 @@ static void evtchn_fifo_set_pending(stru
          */
         q = &v->evtchn_fifo->queue[evtchn->priority];
 
-        old_v = d->vcpu[evtchn->last_vcpu_id];
-        old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority];
+        old_q = lock_old_queue(d, evtchn, &flags);
 
-        spin_lock_irqsave(&old_q->lock, flags);
+        if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) )
+        {
+            spin_unlock_irqrestore(&old_q->lock, flags);
+            goto done;
+        }
 
         /*
          * If this event was a tail, the old queue is now empty and
@@ -152,12 +178,9 @@ static void evtchn_fifo_set_pending(stru
         if ( old_q->tail == port )
             old_q->tail = 0;
 
-        set_bit(EVTCHN_FIFO_LINKED, word);
-
         /* Moved to a different queue? */
-        if ( old_q != q)
+        if ( old_q != q )
         {
-
             evtchn->last_vcpu_id = evtchn->notify_vcpu_id;
             evtchn->last_priority = evtchn->priority;
 
@@ -191,7 +214,7 @@ static void evtchn_fifo_set_pending(stru
                                   &v->evtchn_fifo->control_block->ready) )
             vcpu_mark_events_pending(v);
     }
-
+done:
     if ( !was_pending )
         evtchn_check_pollers(d, port);
 }

  reply	other threads:[~2013-11-20 17:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-19 18:17 [PATCHv5 0/2] Xen: FIFO-based event channel ABI fixes David Vrabel
2013-11-19 18:17 ` [PATCH 1/2] evtchn/fifo: only set READY for new heads David Vrabel
2013-11-19 18:17 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel
2013-11-20 17:21   ` David Vrabel [this message]
2013-11-22 12:02     ` Jan Beulich
2013-11-22 18:23       ` David Vrabel
2013-11-25  9:10         ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2013-12-10 13:56 [PATCHv7 0/2] Xen: FIFO-based event channel fixes David Vrabel
2013-12-10 13:57 ` [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked David Vrabel
2013-12-10 14:55   ` Jan Beulich
2014-01-07 15:50   ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528CEF8C.904@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).