xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xenproject.org, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH] evtchn: clean last_vcpu_id on EVTCHNOP_reset to avoid crash
Date: Mon, 11 Aug 2014 12:03:07 +0200	[thread overview]
Message-ID: <871tsn4a3o.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <53E506DD.7020205@citrix.com> (David Vrabel's message of "Fri, 8 Aug 2014 18:20:29 +0100")

David Vrabel <david.vrabel@citrix.com> writes:

> On 08/08/14 16:17, Vitaly Kuznetsov wrote:
>> David Vrabel <david.vrabel@citrix.com> writes:
>> 
>>> On 08/08/14 15:22, Vitaly Kuznetsov wrote:
>>>> When EVTCHNOP_reset is being performed last_vcpu_id attribute is not being
>>>> cleaned by __evtchn_close(). In case last_vcpu_id != 0 for a particular
>>>> event channel and this event channel is going to be used for event delivery
>>>> (for another vcpu) before EVTCHNOP_init_control for vcpu == last_vcpu_id
>>>> was done the following crash is observed:
>>>>
>>>>  ...
>>>>  (XEN) Xen call trace:
>>>>  (XEN)    [<ffff82d080127785>] _spin_lock_irqsave+0x5/0x70
>>>>  (XEN)    [<ffff82d0801097db>] evtchn_fifo_set_pending+0xdb/0x370
>>>>  (XEN)    [<ffff82d080107146>] evtchn_send+0xd6/0x160
>>>>  (XEN)    [<ffff82d080107df9>] do_event_channel_op+0x6a9/0x16c0
>>>>  (XEN)    [<ffff82d0801ce800>] vmx_intr_assist+0x30/0x480
>>>>  (XEN)    [<ffff82d080219e99>] syscall_enter+0xa9/0xae
>>>>
>>>> This happens because lock_old_queue() does not check VCPU's control
>>>> block existence and after EVTCHNOP_reset they are all cleaned.
>>>>
>>>> I suggest we fix the issue twice: reset last_vcpu_id to 0 in __evtchn_close()
>>>> and add appropriate check to lock_old_queue() as lost event is much better
>>>> than hypervisor crash.
>>>>
>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>> ---
>>>>  xen/common/event_channel.c | 3 +++
>>>>  xen/common/event_fifo.c    | 9 +++++++++
>>>>  2 files changed, 12 insertions(+)
>>>>
>>>> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
>>>> index a7becae..67b9d53 100644
>>>> --- a/xen/common/event_channel.c
>>>> +++ b/xen/common/event_channel.c
>>>> @@ -578,6 +578,9 @@ static long __evtchn_close(struct domain *d1, int port1)
>>>>      chn1->state          = ECS_FREE;
>>>>      chn1->notify_vcpu_id = 0;
>>>>  
>>>> +    /* Reset last_vcpu_id to vcpu0 as control block can be freed */
>>>> +    chn1->last_vcpu_id = 0;
>>>
>>> This is broken if the event channel is closed and rebound while the
>>> event is linked.
>>>
>>> You can only safely clear chn->last_vcpu_id during evtchn_fifo_destroy().
>>>
>>> You also need to clear last_priority.
>>>
>> 
>> Thanks, alternatively I can do that in evtchn_reset() after
>> evtchn_fifo_destroy() as it is the only path leading to the issue. I
>> wanted to avoid that to exclude additional loop for all event channels.
>> 
>>>> +
>>>>      xsm_evtchn_close_post(chn1);
>>>>  
>>>>   out:
>>>> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c
>>>> index 51b4ff6..e4bef80 100644
>>>> --- a/xen/common/event_fifo.c
>>>> +++ b/xen/common/event_fifo.c
>>>> @@ -61,6 +61,15 @@ static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
>>>>      for ( try = 0; try < 3; try++ )
>>>>      {
>>>>          v = d->vcpu[evtchn->last_vcpu_id];
>>>> +
>>>> +        if ( !v->evtchn_fifo )
>>>> +        {
>>>> +            gdprintk(XENLOG_ERR,
>>>> +                     "domain %d vcpu %d has no control block!\n",
>>>> +                     d->domain_id, v->vcpu_id);
>>>> +            return NULL;
>>>> +        }
>>>
>>> I think this check needs to be in evtchn_fifo_init() to prevent the
>>> event from being bound to VCPU that does not have a control block.
>>>
>> 
>> I *think* it is not the issue here - the event is being bound to VCPU
>> with this block initialized. But last_vcpu_id for this particular event
>> channel points to some other VCPU which has not initialized its control
>> block yet (so d->vcpu[evtchn->last_vcpu_id]->evtchn_fifo is NULL). There
>> is no path to get in such situation (after we clear last_vcpu_id), I
>> just wanted to put reasonable message here in case something will change
>> in future.
>
> Then evtchn_fifo_init() needs to check both the new VCPU and
> last_vcpu_id have control blocks.
>
> I much prefer failing the bind up front than detecting the problem later.

I discovered an issue with such approach: xen_setup_timer() is being
called earlier that EVTCHNOP_init_control for all secondary VCPUs in
current linux kernel (but after VCPU0 switched ABI to FIFO). This means
evtchn_fifo_init() is being called for VIRQ event channel which has
notify_vcpu_id pointing to a VCPU with uninitialized control block. And
if we reset is to VCPU0 we'll break everything...

-- 
  Vitaly

      parent reply	other threads:[~2014-08-11 10:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08 14:22 [PATCH] evtchn: clean last_vcpu_id on EVTCHNOP_reset to avoid crash Vitaly Kuznetsov
2014-08-08 15:03 ` Jan Beulich
2014-08-08 15:05 ` David Vrabel
2014-08-08 15:17   ` Vitaly Kuznetsov
2014-08-08 17:20     ` David Vrabel
2014-08-11  9:01       ` Vitaly Kuznetsov
2014-08-11 10:03       ` Vitaly Kuznetsov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871tsn4a3o.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=david.vrabel@citrix.com \
    --cc=drjones@redhat.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).