From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU. Date: Mon, 12 Jan 2015 12:35:36 -0500 Message-ID: <20150112173536.GA24188@l.oracle.com> References: <1421081130-22959-1-git-send-email-konrad.wilk@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YAiu2-0000S7-EC for xen-devel@lists.xenproject.org; Mon, 12 Jan 2015 17:35:46 +0000 Content-Disposition: inline In-Reply-To: <1421081130-22959-1-git-send-email-konrad.wilk@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: jbeulich@suse.com, andrew.cooper3@citrix.com, xen-devel@lists.xenproject.org, linux@eikelenboom.it, malcolm.crossley@citrix.com List-Id: xen-devel@lists.xenproject.org On Mon, Jan 12, 2015 at 11:45:30AM -0500, Konrad Rzeszutek Wilk wrote: > There is race when we clear the STATE_SCHED in the softirq > - which allows the 'raise_softirq_for' to progress and > schedule another dpci. During that time the other CPU could > receive an interrupt and calls 'raise_softirq_for' and put > the dpci on its per-cpu list. There would be two 'dpci_softirq' > running at the same time (on different CPUs) where the > dpci state is STATE_RUN (and STATE_SCHED is cleared). This > ends up hitting: > > if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) > BUG() > > Instead of that put the dpci back on the per-cpu list to deal > with later. > > The reason we can get his with this is when an interrupt > affinity is set over multiple CPUs. > > Another potential fix would be to add a guard in the raise_softirq_for > to check for 'STATE_RUN' bit being set and not schedule the > dpci until that bit has been cleared. > > Reported-by: Sander Eikelenboom > Reported-by: Malcolm Crossley > Signed-off-by: Konrad Rzeszutek Wilk > --- > xen/drivers/passthrough/io.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c > index ae050df..9b77334 100644 > --- a/xen/drivers/passthrough/io.c > +++ b/xen/drivers/passthrough/io.c > @@ -804,7 +804,17 @@ static void dpci_softirq(void) > d = pirq_dpci->dom; > smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */ > if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) > - BUG(); > + { > + unsigned long flags; > + > + /* Put back on the list and retry. */ > + local_irq_save(flags); > + list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list)); I chatted with Jan on IRC this, and one worry is that if we add on our per-cpu list an 'dpci' that is running on another CPU - if the other CPU runs 'list_del' it will go BOOM. However, I am not sure if I can come up with a scenario in which this will be triggered - as by the time we get to checking the STATE_RUN condition the list removal has already been done. So adding in on the per-cpu list is OK - and since the STATE_SCHED is set, it guards against double-list addition. CPU1 CPU2 CPU3: softirq_dpci: list_del() from per-cpu-list test-and-set(STATE_RUN) test-and-clear(STATE_SCHED) .. raise_softirq test-and-set(STATE_SCHED) softirq_dpci list_del from per-cpu-list if-test-and-set(STATE_RUN) fails: list_add_tail(..) continue raise_softirq test-and-set(STATE_SCHED) [fails, so no adding] .. softirq_dpci list_del if-test-and-set(STATE_RUN) fails: list_add_tail(..) [OK, we did the list_del] continue.. clear_bit(STATE_RUN) softrqi_dpci list_del() test-and-set(STATE_RUN) test-and-clear(STATE_SCHED) > + local_irq_restore(flags); > + > + raise_softirq(HVM_DPCI_SOFTIRQ); > + continue; > + } > /* > * The one who clears STATE_SCHED MUST refcount the domain. > */ > -- > 2.1.0 >