All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Freitas <sflist@ihonk.com>
To: Jan Beulich <JBeulich@suse.com>, pasik@iki.fi
Cc: Don Slutz <dslutz@verizon.com>, xen-devel@lists.xen.org
Subject: Re: Regression, host crash with 4.5rc1
Date: Mon, 10 Nov 2014 12:05:26 -0800	[thread overview]
Message-ID: <54611A86.4000200@ihonk.com> (raw)
In-Reply-To: <54608A8B0200007800045E58@mail.emea.novell.com>

On 11/10/2014 0:51, Jan Beulich wrote:
>>>> On 10.11.14 at 09:03, <sflist@ihonk.com> wrote:
>> Sorry for the delay, took some debugging on another computer to get
>> serial logging working. Due to its size, I've posted the entire log of a
>> crashed session here: http://pastebin.com/AiPHUZRH In this case I used a
>> 3.0 gig memory size for the Windows domU.
>>
>> As I mentioned before, sometimes it's the SATA that goes first, other
>> times the tg3 ethernet driver. Also note that between 4.4.1 and 4.5rc1,
>> the kernel I'm using (stock Debian Jessie) has not changed.
>>
>> Please let me know if you need any other information. Thanks!
> Raising the kernel log level to maximum too would have helped.

Okay, I've done that and the output is here, let me know if you have any 
preferred logging flags instead:

http://pastebin.com/M3yvWNTT

> Regardless of that, the first device showing anomalies here appears
> to be the UHCI controller:
>
>      [  147.415713] usb 7-1: reset low-speed USB device number 2 using uhci_hcd
>
> while booting the guest.

I assume this is related to the USB device (a keyboard) I'm passing 
through to the domU.

> And these
>
>      [  199.775209] pcieport 0000:00:03.0: AER: Multiple Corrected error received: id=0018
>      [  199.775238] pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0018(Transmitter ID)
>      [  199.775251] pcieport 0000:00:03.0:   device [8086:340a] error status/mask=00001100/00002000
>      [  199.775255] pcieport 0000:00:03.0:    [ 8] RELAY_NUM Rollover
>      [  199.775258] pcieport 0000:00:03.0:    [12] Replay Timer Timeout
>
> hint at a problem in the system's design. 00:03.0 is the parent bridge
> of 02:00.0 (and from what I can tell that's the only device behind that
> bridge), and hence the above messages can only reasonably have
> their origin at the passed through VGA device.

You are correct that the VGA card is the only device on 03.0:

root@g2:~# lspci -tv
-[0000:00]-+-00.0  Intel Corporation 5520 I/O Hub to ESI Port
            +-01.0-[01]----00.0  Marvell Technology Group Ltd. 
MV64460/64461/64462 System Controller, Revision B
            +-03.0-[02]----00.0  NVIDIA Corporation GT200GL [Quadro FX 4800]
            +-07.0-[03]--
            +-14.0  Intel Corporation 7500/5520/5500/X58 I/O Hub System 
Management Registers
            +-14.1  Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO 
and Scratch Pad Registers
            +-14.2  Intel Corporation 7500/5520/5500/X58 I/O Hub Control 
Status and RAS Registers
            +-16.0  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-16.1  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-16.2  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-16.3  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-16.4  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-16.5  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-16.6  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-16.7  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
            +-1a.0  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #4
            +-1a.1  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #5
            +-1a.7  Intel Corporation 82801JI (ICH10 Family) USB2 EHCI 
Controller #2
            +-1b.0  Intel Corporation 82801JI (ICH10 Family) HD Audio 
Controller
            +-1c.0-[04]--
            +-1c.4-[05]----00.0  Broadcom Corporation NetXtreme BCM5755 
Gigabit Ethernet PCI Express
            +-1c.5-[06-09]----00.0-[07-09]--+-02.0-[08]--
            |                               \-03.0-[09]----00.0 Broadcom 
Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express
            +-1d.0  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #1
            +-1d.1  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #2
            +-1d.2  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #3
            +-1d.3  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #6
            +-1d.7  Intel Corporation 82801JI (ICH10 Family) USB2 EHCI 
Controller #1
            +-1e.0-[0a]----0e.0  Advanced Micro Devices, Inc. [AMD/ATI] 
RV100 [Radeon 7000 / Radeon VE]
            +-1f.0  Intel Corporation 82801JIB (ICH10) LPC Interface 
Controller
            +-1f.2  Intel Corporation 82801JI (ICH10 Family) SATA AHCI 
Controller
            \-1f.3  Intel Corporation 82801JI (ICH10 Family) SMBus 
Controller

What problem in the system's design does this hint at?

> IOW it may well be that
> you were just lucky that things worked earlier on.

Certainly possible but this is a very common machine in the corporate 
world -- a Lenovo ThinkStation D20 running the X58 chipset. If it's an 
inherent defect in the machine and somebody else hasn't already tripped 
over it I would be very surprised.

> And btw - the title saying "host crash" seems to not match the provided
> log, as there's no sign of a crash anywhere (the host may be hung from
> what is visible). Was that just badly worded, or have you actually seen
> crashes too?
>

Only seen hanging. Sorry for the lack of technical rigor on the title, 
but from the other end of the ethernet cable, it might as well have crashed.

If the expanded logging doesn't tell you anything useful, I'll see if I 
can bisect the problem.

Thanks very much for your time.

Steve

  reply	other threads:[~2014-11-10 20:05 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-03 21:46 Regression, host crash with 4.5rc1 Steve Freitas
2014-11-03 21:52 ` Steve Freitas
2014-11-04 14:01   ` Don Slutz
2014-11-04  8:20 ` Pasi Kärkkäinen
2014-11-04 10:15   ` Jan Beulich
2014-11-10  8:03     ` Steve Freitas
2014-11-10  8:51       ` Jan Beulich
2014-11-10 20:05         ` Steve Freitas [this message]
2014-11-11  8:05           ` Jan Beulich
2014-11-17 19:21             ` Steve Freitas
2014-11-18  7:54               ` Jan Beulich
2014-11-20  1:23                 ` Steve Freitas
2014-11-20  7:59                   ` Jan Beulich
2014-11-20 20:07                     ` Steve Freitas
2014-11-21  8:42                       ` Jan Beulich
2014-11-23  1:28                         ` Steve Freitas
2014-11-24  8:45                           ` Jan Beulich
2014-11-24  9:08                             ` Steve Freitas
2014-11-24  9:15                               ` Jan Beulich
2014-11-24 11:41                               ` Jan Beulich
2014-11-24 22:17                                 ` Steve Freitas
2014-11-25  8:16                                   ` Jan Beulich
2014-11-25  9:38                                     ` Steve Freitas
2014-11-25 11:00                                       ` Jan Beulich
2014-11-27  5:29                                         ` Steve Freitas
2014-11-27  9:27                                           ` Jan Beulich
2014-11-28  8:24                                             ` Steve Freitas
2014-11-28  8:50                                               ` Jan Beulich
2014-11-28  9:44                                                 ` Steve Freitas
2014-12-03 17:14                                             ` Dugger, Donald D
2015-02-27 17:25                                             ` Dugger, Donald D
2015-02-27 17:50                                               ` Brown, Len
2015-03-02 15:24                                                 ` Jan Beulich
2015-03-09  0:45                                                   ` Steve Freitas
2015-03-26 20:49                                                     ` Brown, Len
2014-11-21  9:31                       ` Jan Beulich
2014-11-04 18:35   ` Steve Freitas
2014-11-04 14:39 ` Don Slutz
2014-11-06 23:20   ` Steve Freitas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54611A86.4000200@ihonk.com \
    --to=sflist@ihonk.com \
    --cc=JBeulich@suse.com \
    --cc=dslutz@verizon.com \
    --cc=pasik@iki.fi \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.