All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: Xen-unstable: xen panic RIP:   dpci_softirq
Date: Tue, 18 Nov 2014 20:55:41 -0500	[thread overview]
Message-ID: <20141119015541.GA10307@laptop.dumpdata.com> (raw)
In-Reply-To: <348130555.20141118231254@eikelenboom.it>

On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote:
> 
> Tuesday, November 18, 2014, 9:56:33 PM, you wrote:
> 
> >> 
> >> Uhmm i thought i had these switched off (due to problems earlier and then forgot 
> >> about them .. however looking at the earlier reports these lines were also in 
> >> those reports).
> >> 
> >> The xen-syms and these last runs are all with a prestine xen tree cloned today (staging 
> >> branch), so the qemu-xen and seabios defined with that were also freshly cloned 
> >> and had a new default seabios config. (just to rule out anything stale in my tree)
> >> 
> >> If you don't see those messages .. perhaps your seabios and qemu trees (and at least the 
> >> seabios config) are not the most recent (they don't get updated automatically 
> >> when you just do a git pull on the main tree) ?
> >> 
> >> In /tools/firmware/seabios-dir/.config i have:
> >> CONFIG_USB=y
> >> CONFIG_USB_UHCI=y
> >> CONFIG_USB_OHCI=y
> >> CONFIG_USB_EHCI=y
> >> CONFIG_USB_XHCI=y
> >> CONFIG_USB_MSC=y
> >> CONFIG_USB_UAS=y
> >> CONFIG_USB_HUB=y
> >> CONFIG_USB_KEYBOARD=y
> >> CONFIG_USB_MOUSE=y
> >> 
> 
> > I seem to have the same thing. Perhaps it is my XHCI controller being wonky.
> 
> >> And this is all just from a:
> >> - git clone git://xenbits.xen.org/xen.git -b staging
> >> - make clean && ./configure && make -j6 && make -j6 install
> 
> > Aye. 
> > .. snip..
> >> >  1) test_and_[set|clear]_bit sometimes return unexpected values.
> >> >     [But this might be invalid as the addition of the ffff8303faaf25a8
> >> >      might be correct - as the second dpci the softirq is processing
> >> >      could be the MSI one]
> >> 
> >> Would there be an easy way to stress test this function separately in some 
> >> debugging function to see if it indeed is returning unexpected values ?
> 
> > Sadly no. But you got me looking in the right direction when you mentioned
> > 'timeout'.
> >> 
> >> >  2) INIT_LIST_HEAD operations on the same CPU are not honored.
> >> 
> >> Just curious, have you also tested the patches on AMD hardware ?
> 
> > Yes. To reproduce this the first thing I did was to get an AMD box.
> 
> >> 
> >>  
> >> >> When i look at the combination of (2) and (3), It seems it could be an 
> >> >> interaction between the two passed through devices and/or different IRQ types.
> >> 
> >> > Could be - as in it is causing this issue to show up faster than
> >> > expected. Or it is the one that triggers more than one dpci happening
> >> > at the same time.
> >> 
> >> Well that didn't seem to be it (see separate amendment i mailed previously)
> 
> > Right, the current theory I've is that the interrupts are not being
> > Acked within 8 milisecond and we reset the 'state' - and at the same
> > time we get an interrupt and schedule it - while we are still processing
> > the same interrupt. This would explain why the 'test_and_clear_bit'
> > got the wrong value.
> 
> > In regards to the list poison - following this thread of logic - with
> > the 'state = 0' set we open the floodgates for any CPU to put the same
> > 'struct hvm_pirq_dpci' on its list.
> 
> > We do reset the 'state' on _every_ GSI that is mapped to a guest - so
> > we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
> 
> > CPUX:                           CPUY:
> > pt_irq_time_out:
> > state = 0;                      
> > [out of timer coder, the                raise_softirq
> >  pirq_dpci is on the dpci_list]         [adds the pirq_dpci as state == 0]
> 
> > softirq_dpci                            softirq_dpci:
> >         list_del
> >         [entries poison]
> >                                                 list_del <= BOOM
> >                         
> > Is what I believe is happening.
> 
> > The INTX device - once I put a load on it - does not trigger
> > any pt_irq_time_out, so that would explain why I cannot hit this.
> 
> > But I believe your card hits these "hiccups".   
> 
> 
> Hi Konrad,
> 
> I just tested you 5 patches and as a result i still got an(other) host crash:
> (complete serial log attached)
> 
> (XEN) [2014-11-18 21:55:41.591] ----[ Xen-4.5.0-rc  x86_64  debug=y  Not tainted ]----
> (XEN) [2014-11-18 21:55:41.591] CPU:    0
> (XEN) [2014-11-18 21:55:41.591] ----[ Xen-4.5.0-rc  x86_64  debug=y  Not tainted ]----
> (XEN) [2014-11-18 21:55:41.591] RIP:    e008:[<ffff82d08012c7e7>]CPU:    2
> (XEN) [2014-11-18 21:55:41.591] RIP:    e008:[<ffff82d08014a461>] hvm_do_IRQ_dpci+0xbd/0x13c
> (XEN) [2014-11-18 21:55:41.591] RFLAGS: 0000000000010006    _spin_unlock+0x1f/0x30CONTEXT: hypervisor

Duh!

Here is another patch on top of the five you have (attached and inline).

>From 18008650f10199e2b83f74394f634a97086e34b8 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Tue, 18 Nov 2014 20:48:43 -0500
Subject: [PATCH] debug: Whether we want to clear the 'dom' field or not when
 changing the 'state' bit.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/drivers/passthrough/io.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index aad5607..24e2bd6 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -243,13 +243,14 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci->dom parameter to NULL.
+ * Reset the pirq_dpci->dom parameter to NULL if 'clear_dom' is set.
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci,
+                                  unsigned int clear_dom)
 {
     struct domain *d = pirq_dpci->dom;
     struct __debug *debug = &__get_cpu_var(_d);
@@ -276,7 +277,8 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
          * in local variable before it sets STATE_RUN - and therefore will not
          * dereference '->dom' which would crash.
          */
-        pirq_dpci->dom = NULL;
+        if (clear_dom)
+            pirq_dpci->dom = NULL;
         break;
     }
 }
@@ -295,7 +297,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
                               &pirq_dpci->flags) )
     {
         _record(&debug->clear, pirq_dpci);
-        pt_pirq_softirq_reset(pirq_dpci);
+        pt_pirq_softirq_reset(pirq_dpci, 0 /* leave ->dom /);
         /* pirq_dpci->state = 0; <= OUCH! */
         pirq_dpci->pending = 0;
         pirq_guest_eoi(dpci_pirq(pirq_dpci));
@@ -333,9 +335,11 @@ static void pt_irq_time_out(void *data)
     pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL);
 
     spin_unlock(&irq_map->dom->event_lock);
+#if 0
     console_start_sync();
     dump_debug((char)0);
     console_end_sync();
+#endif
 }
 
 struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *d)
@@ -446,7 +450,7 @@ int pt_irq_create_bind(
                      * in the queue.
                      */
                     pirq_dpci->line = __LINE__;
-                    pt_pirq_softirq_reset(pirq_dpci);
+                    pt_pirq_softirq_reset(pirq_dpci, 1 /* clear dom */);
                 }
             }
             if ( unlikely(rc) )
@@ -701,7 +705,7 @@ int pt_irq_destroy_bind(
          * call to pt_pirq_softirq_reset.
          */
         pirq_dpci->line = __LINE__;
-        pt_pirq_softirq_reset(pirq_dpci);
+        pt_pirq_softirq_reset(pirq_dpci, 1 /* clear ->dom */);
 
         pirq_cleanup_check(pirq, d);
     }
-- 
1.9.3

  reply	other threads:[~2014-11-19  1:56 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14 13:11 Xen-unstable: xen panic RIP: dpci_softirq Sander Eikelenboom
2014-11-14 13:32 ` Andrew Cooper
2014-11-14 13:37   ` Andrew Cooper
2014-11-14 13:57 ` Jan Beulich
2014-11-14 14:34   ` Sander Eikelenboom
2014-11-14 15:09     ` Jan Beulich
2014-11-14 15:20       ` Sander Eikelenboom
2014-11-14 15:43         ` Jan Beulich
2014-11-14 16:59           ` Sander Eikelenboom
2014-11-14 17:05             ` Konrad Rzeszutek Wilk
2014-11-14 19:56               ` Sander Eikelenboom
2014-11-14 20:25             ` Konrad Rzeszutek Wilk
2014-11-14 22:09               ` Sander Eikelenboom
2014-11-17 16:34                 ` Konrad Rzeszutek Wilk
2014-11-17 17:04                   ` Sander Eikelenboom
2014-11-17 20:43                     ` Konrad Rzeszutek Wilk
2014-11-17 22:40                       ` Sander Eikelenboom
2014-11-18  8:28                         ` Jan Beulich
     [not found]                         ` <20141118024927.GA32256@andromeda.dapyr.net>
2014-11-18 11:07                           ` Sander Eikelenboom
2014-11-18 15:09                             ` Sander Eikelenboom
2014-11-18 16:06                               ` Sander Eikelenboom
2014-11-18 16:16                               ` Konrad Rzeszutek Wilk
2014-11-18 17:03                                 ` Sander Eikelenboom
2014-11-18 17:20                                   ` Jan Beulich
2014-11-18 20:34                                     ` Konrad Rzeszutek Wilk
2014-11-18 20:56                                   ` Konrad Rzeszutek Wilk
2014-11-18 22:12                                     ` Sander Eikelenboom
2014-11-19  1:55                                       ` Konrad Rzeszutek Wilk [this message]
2014-11-19 11:16                                         ` Sander Eikelenboom
2014-11-19 15:04                                           ` Konrad Rzeszutek Wilk
2014-11-19 17:27                                             ` Sander Eikelenboom
2014-11-17 18:01                   ` Sander Eikelenboom
2014-11-18  9:30                     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141119015541.GA10307@laptop.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=konrad@darnok.org \
    --cc=linux@eikelenboom.it \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.