All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Vrabel <david.vrabel@citrix.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: xen-devel@lists.xenproject.org, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH] evtchn: clean last_vcpu_id on EVTCHNOP_reset to avoid crash
Date: Fri, 8 Aug 2014 18:20:29 +0100	[thread overview]
Message-ID: <53E506DD.7020205@citrix.com> (raw)
In-Reply-To: <87a97f3tac.fsf@vitty.brq.redhat.com>

On 08/08/14 16:17, Vitaly Kuznetsov wrote:
> David Vrabel <david.vrabel@citrix.com> writes:
> 
>> On 08/08/14 15:22, Vitaly Kuznetsov wrote:
>>> When EVTCHNOP_reset is being performed last_vcpu_id attribute is not being
>>> cleaned by __evtchn_close(). In case last_vcpu_id != 0 for a particular
>>> event channel and this event channel is going to be used for event delivery
>>> (for another vcpu) before EVTCHNOP_init_control for vcpu == last_vcpu_id
>>> was done the following crash is observed:
>>>
>>>  ...
>>>  (XEN) Xen call trace:
>>>  (XEN)    [<ffff82d080127785>] _spin_lock_irqsave+0x5/0x70
>>>  (XEN)    [<ffff82d0801097db>] evtchn_fifo_set_pending+0xdb/0x370
>>>  (XEN)    [<ffff82d080107146>] evtchn_send+0xd6/0x160
>>>  (XEN)    [<ffff82d080107df9>] do_event_channel_op+0x6a9/0x16c0
>>>  (XEN)    [<ffff82d0801ce800>] vmx_intr_assist+0x30/0x480
>>>  (XEN)    [<ffff82d080219e99>] syscall_enter+0xa9/0xae
>>>
>>> This happens because lock_old_queue() does not check VCPU's control
>>> block existence and after EVTCHNOP_reset they are all cleaned.
>>>
>>> I suggest we fix the issue twice: reset last_vcpu_id to 0 in __evtchn_close()
>>> and add appropriate check to lock_old_queue() as lost event is much better
>>> than hypervisor crash.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>>  xen/common/event_channel.c | 3 +++
>>>  xen/common/event_fifo.c    | 9 +++++++++
>>>  2 files changed, 12 insertions(+)
>>>
>>> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
>>> index a7becae..67b9d53 100644
>>> --- a/xen/common/event_channel.c
>>> +++ b/xen/common/event_channel.c
>>> @@ -578,6 +578,9 @@ static long __evtchn_close(struct domain *d1, int port1)
>>>      chn1->state          = ECS_FREE;
>>>      chn1->notify_vcpu_id = 0;
>>>  
>>> +    /* Reset last_vcpu_id to vcpu0 as control block can be freed */
>>> +    chn1->last_vcpu_id = 0;
>>
>> This is broken if the event channel is closed and rebound while the
>> event is linked.
>>
>> You can only safely clear chn->last_vcpu_id during evtchn_fifo_destroy().
>>
>> You also need to clear last_priority.
>>
> 
> Thanks, alternatively I can do that in evtchn_reset() after
> evtchn_fifo_destroy() as it is the only path leading to the issue. I
> wanted to avoid that to exclude additional loop for all event channels.
> 
>>> +
>>>      xsm_evtchn_close_post(chn1);
>>>  
>>>   out:
>>> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
>>> index 51b4ff6..e4bef80 100644
>>> --- a/xen/common/event_fifo.c
>>> +++ b/xen/common/event_fifo.c
>>> @@ -61,6 +61,15 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
>>>      for ( try = 0; try < 3; try++ )
>>>      {
>>>          v = d->vcpu[evtchn->last_vcpu_id];
>>> +
>>> +        if ( !v->evtchn_fifo )
>>> +        {
>>> +            gdprintk(XENLOG_ERR,
>>> +                     "domain %d vcpu %d has no control block!\n",
>>> +                     d->domain_id, v->vcpu_id);
>>> +            return NULL;
>>> +        }
>>
>> I think this check needs to be in evtchn_fifo_init() to prevent the
>> event from being bound to VCPU that does not have a control block.
>>
> 
> I *think* it is not the issue here - the event is being bound to VCPU
> with this block initialized. But last_vcpu_id for this particular event
> channel points to some other VCPU which has not initialized its control
> block yet (so d->vcpu[evtchn->last_vcpu_id]->evtchn_fifo is NULL). There
> is no path to get in such situation (after we clear last_vcpu_id), I
> just wanted to put reasonable message here in case something will change
> in future.

Then evtchn_fifo_init() needs to check both the new VCPU and
last_vcpu_id have control blocks.

I much prefer failing the bind up front than detecting the problem later.

David

  reply	other threads:[~2014-08-08 17:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08 14:22 [PATCH] evtchn: clean last_vcpu_id on EVTCHNOP_reset to avoid crash Vitaly Kuznetsov
2014-08-08 15:03 ` Jan Beulich
2014-08-08 15:05 ` David Vrabel
2014-08-08 15:17   ` Vitaly Kuznetsov
2014-08-08 17:20     ` David Vrabel [this message]
2014-08-11  9:01       ` Vitaly Kuznetsov
2014-08-11 10:03       ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53E506DD.7020205@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=drjones@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.