From: Konrad Rzeszutek Wilk <konrad@darnok.org>
To: Roderick Colenbrander <thunderbird2k@gmail.com>
Cc: xen-devel@lists.xensource.com, Andreas Kinzler <ml-xen-devel@hfp.de>
Subject: Re: Questions about GPLPV stability tests
Date: Thu, 8 Dec 2011 18:46:24 -0500 [thread overview]
Message-ID: <20111208234624.GD32474@konrad-lan> (raw)
In-Reply-To: <CAEc3jaBdQKmVJToNZRf+25GtJXqGc274ieHf_yHKRJ+YOKWdgA@mail.gmail.com>
On Wed, Dec 07, 2011 at 08:44:19PM +0000, Roderick Colenbrander wrote:
> On Wed, Dec 7, 2011 at 8:38 PM, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> >> It took about a week, but the systems went down again. Linux is down,
> >> but the hypervisor is still reachable on the serial console. Is there
> >> anything interesting to dump from there?
> >>
> > Just the interrupts information. I think that is Ctrl-A, Ctrl-A,
> > Ctrl-A, and then '*' to capture everything (I don't remember the right
> > one for just interrupts).
>
> Here's the interrupt information:
> (XEN) [2011-12-06 19:13:37] [i: dump interrupt bindings]
> (XEN) [2011-12-06 19:13:37] Guest interrupt information:
> (XEN) [2011-12-06 19:13:37] IRQ: 0
> affinity:00000000,00000000,00000000,00000001 vec:f0 type=IO-APIC-edge
> status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 1
> affinity:00000000,00000000,00000000,00000001 vec:30 type=IO-APIC-edge
> status=00000014 in-flight=0 domain-list=0: 1(-S--),
> (XEN) [2011-12-06 19:13:37] IRQ: 2
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2 type=XT-PIC
> status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 3
> affinity:00000000,00000000,00000000,00000001 vec:38 type=IO-APIC-edge
> status=00000006 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 4
> affinity:00000000,00000000,00000000,00000001 vec:40 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 5
> affinity:00000000,00000000,00000000,00000001 vec:f1 type=IO-APIC-edge
> status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 6
> affinity:00000000,00000000,00000000,00000001 vec:48 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 7
> affinity:00000000,00000000,00000000,00000001 vec:50 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 8
> affinity:00000000,00000000,00000000,00000001 vec:58 type=IO-APIC-edge
> status=00000010 in-flight=0 domain-list=0: 8(-S--),
> (XEN) [2011-12-06 19:13:37] IRQ: 9
> affinity:00000000,00000000,00000000,00000001 vec:60 type=IO-APIC-level
> status=00000010 in-flight=0 domain-list=0: 9(-S--),
> (XEN) [2011-12-06 19:13:37] IRQ: 10
> affinity:00000000,00000000,00000000,00000001 vec:68 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 11
> affinity:00000000,00000000,00000000,00000001 vec:70 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 12
> affinity:00000000,00000000,00000000,00000001 vec:78 type=IO-APIC-edge
> status=00000010 in-flight=0 domain-list=0: 12(-S--),
> (XEN) [2011-12-06 19:13:37] IRQ: 13
> affinity:00000000,00000000,00000000,00000001 vec:88 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 14
> affinity:00000000,00000000,00000000,00000001 vec:90 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 15
> affinity:00000000,00000000,00000000,00000001 vec:98 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37] IRQ: 16
> affinity:00000000,00000000,00000000,00000001 vec:c8 type=IO-APIC-level
> status=00000010 in-flight=0 domain-list=0: 16(-S--),1201: 16(--M-),
> (XEN) [2011-12-06 19:13:37] IRQ: 18
> affinity:00000000,00000000,00000000,00000001 vec:51 type=IO-APIC-level
> status=00000010 in-flight=0 domain-list=0: 18(-S--),
> (XEN) [2011-12-06 19:13:37] IRQ: 19
> affinity:00000000,00000000,00000000,00000001 vec:d0 type=IO-APIC-level
> status=00000010 in-flight=0 domain-list=0: 19(-S--),
> (XEN) [2011-12-06 19:13:37] IRQ: 21
> affinity:00000000,00000000,00000000,00000001 vec:61 type=IO-APIC-level
> status=00000010 in-flight=0 domain-list=0: 21(-S--),
> (XEN) [2011-12-06 19:13:37] IRQ: 23
> affinity:00000000,00000000,00000000,00000001 vec:59 type=IO-APIC-level
> status=00000010 in-flight=1 domain-list=0: 23(PS-M),
> (XEN) [2011-12-06 19:13:37] IRQ: 24
> affinity:00000000,00000000,00000000,00000001 vec:28 type=DMA_MSI
> status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 25
> affinity:00000000,00000000,00000000,00000001 vec:a0 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 26
> affinity:00000000,00000000,00000000,00000001 vec:a8 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 27
> affinity:00000000,00000000,00000000,00000001 vec:b0 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 28
> affinity:00000000,00000000,00000000,00000001 vec:b8 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 29
> affinity:00000000,00000000,00000000,00000001 vec:c0 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 30
> affinity:00000000,00000000,00000000,00000001 vec:d8 type=PCI-MSI
> status=00000014 in-flight=0 domain-list=0:274(PS--),
> (XEN) [2011-12-06 19:13:38] IRQ: 31
> affinity:00000000,00000000,00000000,00000001 vec:21 type=PCI-MSI
> status=00000010 in-flight=0 domain-list=0:273(PS--),
> (XEN) [2011-12-06 19:13:38] IRQ: 32
> affinity:00000000,00000000,00000000,00000001 vec:29 type=PCI-MSI
> status=00000010 in-flight=0 domain-list=0:272(PS--),
> (XEN) [2011-12-06 19:13:38] IRQ: 33
> affinity:00000000,00000000,00000000,00000001 vec:31 type=PCI-MSI
> status=00000010 in-flight=0 domain-list=0:271(PS--),
> (XEN) [2011-12-06 19:13:38] IRQ: 34
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:39 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 35
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:41 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 36
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:49 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38] IRQ: 37
> affinity:00000000,00000000,00000000,00000010 vec:65 type=PCI-MSI
> status=00000010 in-flight=0 domain-list=1201: 55(--M-),
> (XEN) [2011-12-06 19:13:38] IO-APIC interrupt information:
> (XEN) [2011-12-06 19:13:38] IRQ 0 Vec240:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 2: vector=240,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 1 Vec 48:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 1: vector=48,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 3 Vec 56:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 3: vector=56,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 4 Vec 64:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 4: vector=64,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 5 Vec241:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 5: vector=241,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 6 Vec 72:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 6: vector=72,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 7 Vec 80:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 7: vector=80,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 8 Vec 88:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 8: vector=88,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 9 Vec 96:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 9: vector=96,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 10 Vec104:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 10: vector=104,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 11 Vec112:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 11: vector=112,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 12 Vec120:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 12: vector=120,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 13 Vec136:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 13: vector=136,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 14 Vec144:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 14: vector=144,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 15 Vec152:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 15: vector=152,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 16 Vec200:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 16: vector=200,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 18 Vec 81:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 18: vector=81,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 19 Vec208:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 19: vector=208,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 21 Vec 97:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 21: vector=97,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38] IRQ 23 Vec 89:
> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 23: vector=89,
> delivery_mode=1, dest_mode=logical, delivery_status=1, polarity=1,
> irr=1, trigger=level, mask=0, dest_id:1
>
> I noticed some potential interesting things. The system in question is
> using PCI passthrough. On the serial console I remember seeing that
> the PCI device got unmapped from DomU and got mapped again in Dom0.
> The serial console log still had a lot of information about this DomU
> which I guess was going down. I guess it wasn't fully down yet.
One of the debug informations that gets printed with '*' is the guest
PCI passthrough data. Such as which IRQ, BAR, etc are mapped in. Did any
of those exist?
My original thought of what is going on is that the interrupts either
stopped completly working (does not look like - it looks like the
interrupts are firring and the event channels that are bound to it have
the bits set saying - hey I am pending). Oddly there are bunch of MSI
ones that are masked which usually means they are disabled.
Then there is the
affinity:00000000,00000000,00000000,00000001 vec:c8 type=IO-APIC-level
status=00000010 in-flight=0 domain-list=0: 16(-S--),1201: 16(--M-),
affinity:00000000,00000000,00000000,00000010 vec:65 type=PCI-MSI
status=00000010 in-flight=0 domain-list=1201: 55(--M-),
The guest has masked both interrupts - so it is off//dying, but somehow
the unmapping has not happend. In other words, what you just analyzed
and found out.
Sadly, there is no smoking gun..
So a couple of things that I would do is to ensure that the Xen
hypervisor boots with 'sync_console console_to_ring' and when this
crash happens see if I can get a stack trace from both the Xen
hypervisor (there are some debug parameters to get that - I think it is
'r'?), and also from the Linux kernel.
But also see if the problem disappears with the latest 4.1.x hypervisor.
And it might also be worth upgrading the dom0 to a 3.0. Hmm, it would be
very interesting to see where the dom0 _is_ stuck at. The hypervisor is
running fine so it all points to dom0 crashing somewhere..
Can you make sure that dom0 runs with 'debug loglevel=8' as well. That
should give some wealth of information when/if a crash happens.
Oh, and try to pass in Ctrl-Alt-SysRQ-t (on serial concentrators I think
you just need to type 'break' on the telnet line)and then t. But I am
not entirely sure about it - might want to double check Google and see
how to enable Alt-SysRQ.
next prev parent reply other threads:[~2011-12-08 23:46 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAEc3jaBds7xmiKH8RMfmEHQumkEeZ3PopVfbZPs_B-Kz8-DsWw@mail.gmail.com>
2011-11-29 17:12 ` Questions about GPLPV stability tests Andreas Kinzler
2011-11-29 18:04 ` Roderick Colenbrander
2011-11-29 18:15 ` Andreas Kinzler
2011-11-29 18:21 ` Roderick Colenbrander
2011-11-30 13:11 ` Andreas Kinzler
2011-11-29 23:02 ` Konrad Rzeszutek Wilk
2011-11-30 0:01 ` Roderick Colenbrander
2011-12-06 19:12 ` Roderick Colenbrander
2011-12-07 20:38 ` Konrad Rzeszutek Wilk
2011-12-07 20:44 ` Roderick Colenbrander
2011-12-08 23:46 ` Konrad Rzeszutek Wilk [this message]
2011-12-09 1:33 ` Roderick Colenbrander
2011-12-09 22:02 ` Roderick Colenbrander
2011-12-11 12:52 ` Pasi Kärkkäinen
2011-12-11 18:58 ` Roderick Colenbrander
2011-12-12 22:30 ` Roderick Colenbrander
2011-12-13 1:50 ` James Harper
2011-12-13 1:58 ` Roderick Colenbrander
2011-12-13 2:02 ` Konrad Rzeszutek Wilk
2011-12-13 21:20 ` Konrad Rzeszutek Wilk
2011-12-16 2:25 ` Roderick Colenbrander
2011-12-26 18:45 ` Roderick Colenbrander
2012-01-03 16:55 ` Konrad Rzeszutek Wilk
2012-01-04 23:37 ` Roderick Colenbrander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111208234624.GD32474@konrad-lan \
--to=konrad@darnok.org \
--cc=ml-xen-devel@hfp.de \
--cc=thunderbird2k@gmail.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).