From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH] xen/events/fifo: Consume unprocessed events when a CPU dies Date: Tue, 30 Jun 2015 08:26:30 -0400 Message-ID: <55928AF6.6050108@oracle.com> References: <1434726957-929-1-git-send-email-ross.lagerwall@citrix.com> <55843D1C.9070905@oracle.com> <55843DF9.5090704@citrix.com> <55911BB5.5030901@citrix.com> <559148F1.6000602@oracle.com> <559266A6.9080105@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Z9ucm-0001CN-87 for xen-devel@lists.xenproject.org; Tue, 30 Jun 2015 12:26:52 +0000 In-Reply-To: <559266A6.9080105@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ross Lagerwall , David Vrabel , xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On 06/30/2015 05:51 AM, Ross Lagerwall wrote: > On 06/29/2015 02:32 PM, Boris Ostrovsky wrote: >> On 06/29/2015 06:19 AM, Ross Lagerwall wrote: >>> On 06/19/2015 05:06 PM, David Vrabel wrote: >>>> On 19/06/15 17:02, Boris Ostrovsky wrote: >>>>> On 06/19/2015 11:15 AM, Ross Lagerwall wrote: >>>>>> When a CPU is offlined, there may be unprocessed events on a port >>>>>> for >>>>>> that CPU. If the port is subsequently reused on a different CPU, it >>>>>> could be in an unexpected state with the link bit set, resulting in >>>>>> interrupts being missed. Fix this by consuming any unprocessed >>>>>> events >>>>>> for a particular CPU when that CPU dies. >>>>>> >>>>>> Signed-off-by: Ross Lagerwall >>>>>> --- >>>>>> drivers/xen/events/events_fifo.c | 23 ++++++++++++++++++----- >>>>>> 1 file changed, 18 insertions(+), 5 deletions(-) >>>>>> >>>>>> diff --git a/drivers/xen/events/events_fifo.c >>>>>> b/drivers/xen/events/events_fifo.c >>>>>> index 417415d..1dd0ba12 100644 >>>>>> --- a/drivers/xen/events/events_fifo.c >>>>>> +++ b/drivers/xen/events/events_fifo.c >>>>>> @@ -281,7 +281,8 @@ static void handle_irq_for_port(unsigned port) >>>>>> static void consume_one_event(unsigned cpu, >>>>>> struct evtchn_fifo_control_block >>>>>> *control_block, >>>>>> - unsigned priority, unsigned long *ready) >>>>>> + unsigned priority, unsigned long *ready, >>>>>> + bool drop) >>>>>> { >>>>>> struct evtchn_fifo_queue *q = &per_cpu(cpu_queue, cpu); >>>>>> uint32_t head; >>>>>> @@ -313,13 +314,17 @@ static void consume_one_event(unsigned cpu, >>>>>> if (head == 0) >>>>>> clear_bit(priority, ready); >>>>>> - if (evtchn_fifo_is_pending(port) && >>>>>> !evtchn_fifo_is_masked(port)) >>>>>> - handle_irq_for_port(port); >>>>>> + if (evtchn_fifo_is_pending(port) && >>>>>> !evtchn_fifo_is_masked(port)) { >>>>>> + if (unlikely(drop)) >>>>>> + pr_warn("Dropping pending event for port %u\n", port); >>>>> >>>>> Maybe pr_info (or pr_notice)? >>>> >>>> We want a warning here because we think this shouldn't happen -- if it >>>> does we actually need to retrigger the event on its new CPU. >>>> >>>>> Also, why not do this (testing for unprocessed events) in >>>>> xen_evtchn_close()? >>>> >>>> We can't do anything about them when closing because they may be in >>>> the >>>> middle of a queue. >> >> (Sorry, I missed this) >> >> Why can't (actually, why doesn't) the cpu that is being offlined drain >> its queue? >> > > Where would this be done? I thought using CPU notifiers was the > correct way to hook when a CPU goes down without having to stick fifo > event channel code in the core Xen code. In xen_evtchn_close(). We should be getting there (roughly) as cpu_die() -> xen_cpu_die() -> xen_smp_intr_free() -> unbind_from_irqhandler(). In fact, this path is taken right before cpu_down() sends CPU_DEAD notifications. I think cleaning up in xen_evtchn_close() is better because it is possible to close event channel for reasons other than CPU going away, in which case we also may need to deal with unprocessed events. (BTW, I noticed that you are cleaning up fifo events only. Do we need to do the same for 2-level?) -boris