[Bug 187231] New: kernel panic during hpsa MSI plus tg3 MSI

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: bugzilla-daemon@bugzilla.kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 187231] New: kernel panic during hpsa MSI plus tg3 MSI
Date: Mon, 07 Nov 2016 13:53:06 +0000	[thread overview]
Message-ID: <bug-187231-11613@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=187231

            Bug ID: 187231
           Summary: kernel panic during hpsa MSI plus tg3 MSI
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 4.8.6
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
          Assignee: linux-scsi@vger.kernel.org
          Reporter: kernelorg@bof.de
        Regression: No

Created attachment 243801
  --> https://bugzilla.kernel.org/attachment.cgi?id=243801&action=edit
kernel 4.8.6 .config

I'm not sure whether this is a SCSI / HPSA bug or a networking / tg3 driver
bug. Both are seen in the stack dump. As the trigger seems to be HPSA I'm
reporting as a SCSI issue here...

I've been recently attempting to run mainline 4.8.x kernels, most recently
4.8.6, on our production HP DL 380 Intel servers.

On several of them there is some related issue reported in
https://bugzilla.kernel.org/show_bug.cgi?id=187221 where the HPSA driver on
some of the hosts sometimes resets the logical device. I had seen that already
with 4.4.x kernels, and again with 4.8.6.

Now, specifically with 4.8.6, on the box which has the worst of these symptoms,
I _additionally_ experienced multiple full kernel panics. The same box (with
the same hpsa reset symtoms) had been running 4.4.x kernels before without such
kernel panics. The panics then happened multiple times with about a day in
between.

On the last round I had the ILO SSH console running under screen with logging
enabled, and was able to retrieve the following panic backtrace:

[187283.903173] hpsa 0000:03:00.0: scsi 0:1:0:0: resetting logical 
Direct-Access     HP       LOGICAL VOLUME   RAID-5 SSDSmartPathCap- En- Exp=1   
[187314.331375] sd 0:1:0:0: rejecting I/O to offline device                     
[187314.413441] sd 0:1:0:0: rejecting I/O to offline device                     
[187314.854183] sd 0:1:0:0: rejecting I/O to offline device                     
... lots of these ...
[187328.991285] sd 0:1:0:0: rejecting I/O to offline device                     
[187328.991389] sd 0:1:0:0: rejecting I/O to offline device                     
[187329.190166] sd 0:1:0:0: rejecting I/O to offline device                     
[187329.271304]  ffff88bd1a7e8000 ffff88bd1a7be500 ffff88bd7f483eb8
ffffffff8143
493f                                                                            
[187329.271304] Call Trace:                                                     
[187329.271310]  <IRQ>                                                          
[187329.271310]  [<ffffffffa002e332>] ? tg3_poll_msix+0xc2/0x160 [tg3]          
[187329.271311]  [<ffffffff8143493f>] do_hpsa_intr_msi+0x8f/0x1c0               
[187329.271314]  [<ffffffff81148c46>] __handle_irq_event_percpu+0x66/0xe0       
[187329.271315]  [<ffffffff81148cde>] handle_irq_event_percpu+0x1e/0x50         
[187329.271316]  [<ffffffff81148d37>] handle_irq_event+0x27/0x50                
[187329.271318]  [<ffffffff8114bda5>] handle_edge_irq+0x65/0x140                
[187329.271320]  [<ffffffff81057255>] handle_irq+0x15/0x20                      
[187329.271321]  [<ffffffff81057086>] do_IRQ+0x46/0xd0                          
[187329.271324]  [<ffffffff816dc4fc>] common_interrupt+0x7c/0x7c                
[187329.271325]  <EOI>                                                          
[187329.271338] Code: 53 48 89 fb 48 83 ec 28 4c 8b a7 5c 02 00 00 4c 8b bf 40
0
2 00 00 4c 8b b7 38 02 00 00 4c 8b af 4c 02 00 00 49 8b 04 24 4c 89 e7 <48> 8b
8
0 98 00 00 00 48 89 45 c0 49 8b 87 d0 01 00 00 48 89 45                         
[187329.271339] RIP  [<ffffffff81431417>] complete_scsi_command+0x37/0x8c0      
[187329.271339]  RSP <ffff88bd7f483e38>                                         
[187329.271339] CR2: 0000000000000098                                           
[187329.271341] ---[ end trace 52898916f0da5c53 ]---                            
[187329.273413] Kernel panic - not syncing: Fatal exception in interrupt        
[187330.308465] Shutting down cpus with NMI                                     
[187330.308471] Kernel Offset: disabled                                         
[187330.919173] Rebooting in 300 seconds..  

I'll attach my kernel .config.

As this is a production system and so far the panics only hit with our usual
(webserver and DB kvm machine) production load active, there's not much testing
or bisecting I can do, but I didn't want to drop the issue unreported, either. 

Hope this helps somebody. If there is any more info I can provide, just ask
what would be useful.

(I'm back to running 4.4.x)

-- 
You are receiving this mail because:
You are the assignee for the bug.

next             reply	other threads:[~2016-11-07 13:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-07 13:53 bugzilla-daemon [this message]
2016-11-07 15:25 ` [Bug 187231] kernel panic during hpsa MSI plus tg3 MSI bugzilla-daemon
2016-11-07 15:45 ` bugzilla-daemon
2016-11-07 16:16 ` bugzilla-daemon
2016-11-08 16:01 ` bugzilla-daemon
2016-11-11  9:54 ` bugzilla-daemon
2016-11-16  6:17 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-187231-11613@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).