Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stan Hoeppner <stan@hardwarefreak.com>
To: Nix <nix@esperi.org.uk>
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?
Date: Wed, 19 Sep 2012 17:30:56 -0500	[thread overview]
Message-ID: <505A47A0.1060709@hardwarefreak.com> (raw)
In-Reply-To: <87mx0lsv4d.fsf@spindle.srvr.nix>

On 9/19/2012 1:52 PM, Nix wrote:
> So I have this x86-64 server running Linux 3.5.1 

When did you install 3.5.1 on this machine?  If fairly recently, does it
run without these errors when booted into the previous kernel?

> with a SATA-on-PCIe
> Areca 1210 hardware RAID-5 controller driven by libata which has been
> humming along happily for years -- but suddenly, today, the entire
> machine froze for a couple of minutes (or at least fs access froze),
> followed by this in the logs:
> 
> Sep 19 16:55:47 spindle notice: [3447524.381843] arcmsr0: abort device command of scsi id = 0 lun = 1 
> [... repeated a few times at intervals over the next five minutes,
>  followed by a mass of them at 16:59:29, and...]
> Sep 19 16:59:25 spindle err: [3447657.821450] arcmsr: executing bus reset eh.....num_resets = 0, num_aborts = 33 
> Sep 19 16:59:25 spindle notice: [3447697.878386] arcmsr0: wait 'abort all outstanding command' timeout 
> Sep 19 16:59:25 spindle notice: [3447697.878628] arcmsr0: executing hw bus reset .....
> Sep 19 16:59:25 spindle err: [3447698.287054] irq 16: nobody cared (try booting with the "irqpoll" option)
> Sep 19 16:59:25 spindle warning: [3447698.287291] Pid: 0, comm: swapper/4 Not tainted 3.5.1-dirty #1
> Sep 19 16:59:25 spindle warning: [3447698.287522] Call Trace:
> Sep 19 16:59:25 spindle warning: [3447698.287754]  <IRQ>  [<ffffffff810af5ba>] __report_bad_irq+0x31/0xc2
> Sep 19 16:59:25 spindle warning: [3447698.288031]  [<ffffffff810af84e>] note_interrupt+0x16a/0x1e8
> Sep 19 16:59:25 spindle warning: [3447698.288263]  [<ffffffff810ad9d5>] handle_irq_event_percpu+0x163/0x1a5
> Sep 19 16:59:25 spindle warning: [3447698.288497]  [<ffffffff810ada4f>] handle_irq_event+0x38/0x55
> Sep 19 16:59:25 spindle warning: [3447698.288727]  [<ffffffff810b01a0>] handle_fasteoi_irq+0x78/0xab
> Sep 19 16:59:25 spindle warning: [3447698.288960]  [<ffffffff8103631c>] handle_irq+0x24/0x2a
> Sep 19 16:59:25 spindle warning: [3447698.289189]  [<ffffffff81036229>] do_IRQ+0x4d/0xb4
> Sep 19 16:59:25 spindle warning: [3447698.289419]  [<ffffffff815070e7>] common_interrupt+0x67/0x67
> Sep 19 16:59:25 spindle warning: [3447698.289648]  <EOI>  [<ffffffff812ab174>] ? acpi_idle_enter_c1+0xcb/0xf2
> Sep 19 16:59:25 spindle warning: [3447698.289919]  [<ffffffff812ab152>] ? acpi_idle_enter_c1+0xa9/0xf2
> Sep 19 16:59:25 spindle warning: [3447698.290152]  [<ffffffff813c1446>] cpuidle_enter+0x12/0x14
> Sep 19 16:59:25 spindle warning: [3447698.290382]  [<ffffffff813c1902>] cpuidle_idle_call+0xc5/0x175
> Sep 19 16:59:25 spindle warning: [3447698.290614]  [<ffffffff8103c2da>] cpu_idle+0x5b/0xa5
> Sep 19 16:59:25 spindle warning: [3447698.290844]  [<ffffffff81ad4fcb>] start_secondary+0x1a2/0x1a6
> Sep 19 16:59:25 spindle err: [3447698.291074] handlers:
> Sep 19 16:59:25 spindle err: [3447698.291294] [<ffffffff8133b9a3>] usb_hcd_irq
> Sep 19 16:59:25 spindle emerg: [3447698.291553] Disabling IRQ #16
> Sep 19 16:59:25 spindle err: [3447710.888187] arcmsr0: waiting for hw bus reset return, retry=0
> Sep 19 16:59:25 spindle err: [3447720.882155] arcmsr0: waiting for hw bus reset return, retry=1
> Sep 19 16:59:25 spindle notice: [3447730.896410] Areca RAID Controller0: F/W V1.46 2009-01-06 & Model ARC-1210
> Sep 19 16:59:25 spindle err: [3447730.916348] arcmsr: scsi  bus reset eh returns with success
> 
> This is the first SCSI (that is, um, ATA) bus reset I have *ever* had on
> this machine, hence my concern. (The IRQ disable we can ignore: it was
> just bad luck that an interrupt destined for the Areca hit after the
> controller had briefly vanished from the PCI bus as part of resetting.)
> 
> Now just last week another (surge-protected) machine on the same power
> main as it died without warning with a fried power supply which
> apparently roasted the BIOS and/or other motherboard components before
> it died (the ACPI DSDT was filled with rubbish, and other things must
> have been fried because even with ACPI off Linux wouldn't boot more than
> one time out of a hundred (freezing solid at different places in the
> boot each time). So my worry level when this SCSI bus reset turned up
> today is quite high. It's higher given that the controller logs
> (accessed via the Areca binary-only utility for this purpose) show no
> sign of any problem at all.
> 
> EDAC shows no PCI bus problems and no memory problems, so this probably
> *is* the controller.
> 
> So... is this a serious problem? Does anyone know if I'm about to lose
> this controller, or indeed machine as well? (I really, really hope not.)
> 
> I'd write this off as a spurious problem and not report it at all, but
> I'm jittery as heck after the catastrophic hardware failure last week,
> and when this happens in close proximity, I worry.

next prev parent reply	other threads:[~2012-09-19 22:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-19 18:52 Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller? Nix
2012-09-19 20:19 ` Chris Murphy
2012-09-23 15:41   ` Nix
2012-10-01 21:33     ` Pierre Beck
2012-10-01 22:46       ` Chris Murphy
2012-10-01 23:54         ` Pierre Beck
2012-10-02  0:10       ` Nix
2012-09-19 22:30 ` Stan Hoeppner [this message]
2012-09-20  6:51   ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=505A47A0.1060709@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).