From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030675AbbD1Qf7 (ORCPT ); Tue, 28 Apr 2015 12:35:59 -0400 Received: from smtp.citrix.com ([66.165.176.89]:21861 "EHLO SMTP.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030378AbbD1Qf4 (ORCPT ); Tue, 28 Apr 2015 12:35:56 -0400 X-IronPort-AV: E=Sophos;i="5.11,664,1422921600"; d="scan'208";a="257400238" Message-ID: <553FB540.4040707@citrix.com> Date: Tue, 28 Apr 2015 17:28:48 +0100 From: David Vrabel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0 MIME-Version: 1.0 To: Boris Ostrovsky , , CC: , , Subject: Re: [Xen-devel] [PATCH 1/4] xen/events: Clear cpu_evtchn_mask before resuming References: <1430236333-11905-1-git-send-email-boris.ostrovsky@oracle.com> <1430236333-11905-2-git-send-email-boris.ostrovsky@oracle.com> In-Reply-To: <1430236333-11905-2-git-send-email-boris.ostrovsky@oracle.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-DLP: MIA2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 28/04/15 16:52, Boris Ostrovsky wrote: > When a guest is resumed, the hypervisor may change event channel > assignments. If this happens and the guest uses 2-level events it > is possible for the interrupt to be claimed by wrong VCPU since > cpu_evtchn_mask bits may be stale. This can happen even though > evtchn_2l_bind_to_cpu() attempts to clear old bits: irq_info that > is passed in is not necessarily the original one (from pre-migration > times) but instead is freshly allocated during resume and so any > information about which CPU the channel was bound to is lost. > > Thus we should clear the mask during resume. > > We also need to make sure that bits for xenstore and console channels > are set when these two subsystems are resumed. While rebind_evtchn_irq() > (which is invoked for both of them on a resume) calls irq_set_affinity(), > the latter will in fact postpone setting affinity until handling the > interrupt. But because cpu_evtchn_mask will have bits for these two cleared > we won't be able to take the interrupt. > > Setting IRQ_MOVE_PCNTXT flag for the two irqs avoids this problem by > allowing to set affinity immediately, which is safe for event-channel-based > interrupts. [...] > --- a/drivers/tty/hvc/hvc_xen.c > +++ b/drivers/tty/hvc/hvc_xen.c > @@ -533,6 +533,7 @@ static int __init xen_hvc_init(void) > > info = vtermno_to_xencons(HVC_COOKIE); > info->irq = bind_evtchn_to_irq(info->evtchn); > + irq_set_status_flags(info->irq, IRQ_MOVE_PCNTXT); > } > if (info->irq < 0) > info->irq = 0; /* NO_IRQ */ > diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c > index 5db43fc..7dd4631 100644 > --- a/drivers/xen/events/events_2l.c > +++ b/drivers/xen/events/events_2l.c > @@ -345,6 +345,15 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id) > return IRQ_HANDLED; > } > > +static void evtchn_2l_resume(void) > +{ > + int i; > + > + for_each_online_cpu(i) > + memset(per_cpu(cpu_evtchn_mask, i), 0, sizeof(xen_ulong_t) * > + EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD); > +} > + > static const struct evtchn_ops evtchn_ops_2l = { > .max_channels = evtchn_2l_max_channels, > .nr_channels = evtchn_2l_max_channels, > @@ -356,6 +365,7 @@ static const struct evtchn_ops evtchn_ops_2l = { > .mask = evtchn_2l_mask, > .unmask = evtchn_2l_unmask, > .handle_events = evtchn_2l_handle_events, > + .resume = evtchn_2l_resume, > }; > > void __init xen_evtchn_2l_init(void) > diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c > index fdb0f33..30203d1 100644 > --- a/drivers/xen/xenbus/xenbus_comms.c > +++ b/drivers/xen/xenbus/xenbus_comms.c > @@ -231,6 +231,7 @@ int xb_init_comms(void) > } > > xenbus_irq = err; > + irq_set_status_flags(xenbus_irq, IRQ_MOVE_PCNTXT); IRQ_MOVE_PCNTXT means "Interrupt can be migrated from process context" which doesn't really sound relevant to me here? Thomas Glexnier is really not happy with mis-use of IRQ APIs. >>From the commit log the evtchn_2l_resume() fucntion that's added sounds like it fixes the problem on its own? David