From: Shaun Ruffell <sruffell@digium.com>
To: Ian Kumlien <ian.kumlien@gmail.com>
Cc: linux-netdev@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Russ Meyerriecks <rmeyerriecks@digium.com>
Subject: Re: [igb] AER timeout - resend.
Date: Wed, 1 Jul 2015 12:58:59 -0500 [thread overview]
Message-ID: <20150701175859.GA89727@digium.com> (raw)
In-Reply-To: <CAA85sZvM_cOq8JEVjBkBvP7BZpNTjuN3_4E8b=xpyZfm_8Vr-Q@mail.gmail.com>
On Mon, Feb 23, 2015 at 03:56:56PM +0100, Ian Kumlien wrote:
> Sending this to both netdev and kernel since i don't know if it's the
> driver or the pcie AER that does something odd - the machine was
> stable before 3.19 and PCIE AER.
>
> Everything started out like i first sent to linux nics () intel:
> ------
>
> And today i had some issues and wondered why things was broken, i was met with:
>
> [950016.366477] pcieport 0000:00:04.0: AER: Uncorrected (Non-Fatal)
> error received: id=0500
> [950016.366495] igb 0000:05:00.0: PCIe Bus Error: severity=Uncorrected
> (Non-Fatal), type=Transaction Layer, id=0500(Requester ID)
> [950016.366502] igb 0000:05:00.0: device [8086:1521] error
> status/mask=00004000/00000000
> [950016.366509] igb 0000:05:00.0: [14] Completion Timeout
> [950016.366519] igb 0000:05:00.0: broadcast error_detected message
> [950016.379742] br0: port 1(enp5s0f0) entered disabled state
> [950016.488213] igb 0000:05:00.0: broadcast slot_reset message
> [950016.588014] igb 0000:05:00.0: broadcast resume message
> [950016.752654] igb 0000:05:00.0: AER: Device recovery successful
> [950019.817249] igb 0000:05:00.1 enp5s0f1: igb: enp5s0f1 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> [950020.699773] igb 0000:05:00.0 enp5s0f0: igb: enp5s0f0 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: RX
> [950020.701485] br0: port 1(enp5s0f0) entered forwarding state
> [950020.701504] br0: port 1(enp5s0f0) entered forwarding state
> [976152.448092] ata5: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [976152.448100] ata5: irq_stat 0x00400040, connection status changed
> [976152.448107] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
> [976152.448117] ata5: hard resetting link
> [976152.448134] ata6: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [976152.448140] ata6: irq_stat 0x00400040, connection status changed
> [976152.448147] ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
> [976152.448155] ata6: hard resetting link
> [976153.171195] ata6: SATA link down (SStatus 0 SControl 300)
> [976158.174058] ata6: hard resetting link
> [976158.174110] ata5: SATA link down (SStatus 0 SControl 300)
> [976163.176997] ata5: hard resetting link
> [976163.480133] ata6: SATA link down (SStatus 0 SControl 300)
> [976163.480147] ata6: limiting SATA link speed to 1.5 Gbps
> [976168.483028] ata6: hard resetting link
> [976168.483095] ata5: SATA link down (SStatus 0 SControl 300)
> [976168.483108] ata5: limiting SATA link speed to 1.5 Gbps
> [976173.485907] ata5: hard resetting link
> [976173.789066] ata6: SATA link down (SStatus 0 SControl 310)
> [976173.789080] ata6.00: disabled
> [976173.791066] ata6: EH complete
> [976173.791078] ata5: SATA link down (SStatus 0 SControl 310)
> [976173.791085] ata6.00: detaching (SCSI 5:0:0:0)
> [976173.791090] ata5.00: disabled
> [976173.794073] ata5: EH complete
> [976173.794100] ata5.00: detaching (SCSI 4:0:0:0)
> [976173.794968] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
> [976173.795073] sd 5:0:0:0: [sdb] Synchronize Cache(10) failed:
> Result: hostbyte=0x04 driverbyte=0x00
> [976173.795080] sd 5:0:0:0: [sdb] Stopping disk
> [976173.795108] sd 5:0:0:0: [sdb] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [976173.797180] sd 4:0:0:0: [sda] Synchronizing SCSI cache
> [976173.797254] sd 4:0:0:0: [sda] Synchronize Cache(10) failed:
> Result: hostbyte=0x04 driverbyte=0x00
> [976173.797261] sd 4:0:0:0: [sda] Stopping disk
> [976173.797285] sd 4:0:0:0: [sda] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
>
> So two out of two disks just failed and isn't replying anymore?
>
> Seven hours after a AER this machine who's intel ssd:s are idle just
> fail to respond? ;)
>
> Anyway, will reboot it when i get home - any idea/suggestion is more
> than welcome.
Hi Ian,
Did you ever find a resolution to this? I'm seeing something very
similar where a customer upgrades to 3.19 and then there are AER
errors and the links are brought down but 3.10 works fine.
Thanks,
Shaun
next prev parent reply other threads:[~2015-07-01 18:02 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-23 14:56 [igb] AER timeout - resend Ian Kumlien
2015-07-01 17:58 ` Shaun Ruffell [this message]
[not found] ` <CAA85sZtOw8R3bHMkwp4YmtubdYHQ0NVvqHtQwBvN=FnTWG4iow@mail.gmail.com>
2015-07-01 21:30 ` Shaun Ruffell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150701175859.GA89727@digium.com \
--to=sruffell@digium.com \
--cc=ian.kumlien@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-netdev@vger.kernel.org \
--cc=rmeyerriecks@digium.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.