* Re: Kernel panic w/ message request_threaded_irq -> qla2x00_request_irqs -> qla2x00_probe_one -> mod_timer
[not found] ` <003a89f7-7077-d9f1-75e5-bc0de6d6e038@mdevsys.com>
@ 2019-09-29 17:04 ` TomK
0 siblings, 0 replies; only message in thread
From: TomK @ 2019-09-29 17:04 UTC (permalink / raw)
To: Laurence Oberman, Bart Van Assche, linux-scsi
On 5/12/2019 11:56 PM, TomK wrote:
> On 5/3/2019 9:07 AM, TomK wrote:
>> On 5/2/2019 10:00 PM, Laurence Oberman wrote:
>>> On Sun, 2019-04-28 at 12:11 -0400, TomK wrote:
>>>> On 4/15/2019 10:26 PM, TomK wrote:
>>>>> On 4/15/2019 3:35 PM, Laurence Oberman wrote:
>>>>>> On Mon, 2019-04-15 at 08:39 -0700, Bart Van Assche wrote:
>>>>>>> On Mon, 2019-04-15 at 08:55 -0400, Laurence Oberman wrote:
>>>>>>>> On Sun, 2019-04-14 at 23:25 -0400, TomK wrote:
>>>>>>>>> Hey All,
>>>>>>>>>
>>>>>>>>> I'm getting a kernel panic on an Gigabyte GA-890XA-UD3
>>>>>>>>> motherboard
>>>>>>>>> that
>>>>>>>>> I've got a QLE2464 card in as a target (FC). The kernel
>>>>>>>>> has
>>>>>>>>> been
>>>>>>>>> crashing / panicking in the last 1-2 months about once a
>>>>>>>>> week. Before
>>>>>>>>> that, it was rock solid for 4-5 years. I've upgraded to
>>>>>>>>> kernel
>>>>>>>>> 4.18.19
>>>>>>>>> but that hasn't made much of a difference. Since the
>>>>>>>>> message
>>>>>>>>> includes
>>>>>>>>> qla2x00_request_irqs I thought I would try here first.
>>>>>>>>>
>>>>>>>>> Tried to get more info on this but:
>>>>>>>>>
>>>>>>>>> 1) Keyboard doesn't work and locks up when the panic
>>>>>>>>> occurs. No
>>>>>>>>> USB
>>>>>>>>> ports work. Tried the PS/2 port but nothing.
>>>>>>>>>
>>>>>>>>> 2) Unable to capture a kdump. Can't get to the kdump
>>>>>>>>> vmcore due
>>>>>>>>> to
>>>>>>>>> 1).
>>>>>>>>>
>>>>>>>>> The two screenshots is pretty much all I can capture.
>>>>>>>>> Tried
>>>>>>>>> things
>>>>>>>>> like
>>>>>>>>> clocksource=rtc in the kernel parms and disabling hpet1
>>>>>>>>> but
>>>>>>>>> apparently I
>>>>>>>>> haven't disabled it everywhere since it still shows up.
>>>>>>>>>
>>>>>>>>> Wondering if anyone recognizes these messages or has any
>>>>>>>>> idea
>>>>>>>>> what
>>>>>>>>> could
>>>>>>>>> be the issue here? Even a hint would be appreciated.
>>>>>>>>>
>>>>>>>> Hello Tom
>>>>>>>> I have had similar issues and reported them to
>>>>>>>> Himanshu@Cavium
>>>>>>>> I have kept all my target servers at kernel 4.5 as it been
>>>>>>>> the only
>>>>>>>> version that has always been stable.
>>>>>>>> If your motherboard has an NMI (virtual or physical) set all
>>>>>>>> of
>>>>>>>> these
>>>>>>>> in /etc/sysctl.conf
>>>>>>>> Run sysctl -a;dracut -f and reboot
>>>>>>>>
>>>>>>>> kernel.nmi_watchdog = 1
>>>>>>>> kernel.panic_on_io_nmi = 1
>>>>>>>> kernel.panic_on_unrecovered_nmi =
>>>>>>>> kernel.unknown_nmi_panic = 1
>>>>>>>>
>>>>>>>> When the issue shows up press the virtual/physical NMI
>>>>>>>>
>>>>>>>> This is with the assumption that generic kdump is properly
>>>>>>>> setup
>>>>>>>> and
>>>>>>>> dmesg | grep crash shows memory resrved by the crashkernel
>>>>>>>> and that
>>>>>>>> you
>>>>>>>> have tested kdump manually.
>>>>>>>>
>>>>>>>> Other options are use a USB serial port to capture the full
>>>>>>>> log if
>>>>>>>> you
>>>>>>>> cannot get kdump to work.
>>>>>>> That approach may provide further evidence about kernel bugs
>>>>>>> but it
>>>>>>> is not
>>>>>>> guaranteed that that approach will lead to a solution. It would
>>>>>>> help
>>>>>>> if
>>>>>>> either or both of you could do the following on a test system:
>>>>>>> * Check out branch qla2xxx-for-next of my kernel repo on
>>>>>>> github
>>>>>>> (https://github.com/bvanassche/linux/tree/qla2xxx-for-next).
>>>>>>> * Enable lockdep and KASAN in the kernel config
>>>>>>> (CONFIG_PROVE_LOCKING
>>>>>>> and
>>>>>>> CONFIG_KASAN).
>>>>>>> * Build and install that kernel.
>>>>>>> * Run your favorite workload.
>>>>>>>
>>>>>>> Please note that the qla2xxx-for-next branch is based on the
>>>>>>> v5.1-rc1
>>>>>>> kernel
>>>>>>> and hence should not be installed on any production system.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Bart.
>>>>>> Hello Bart
>>>>>> OK, I will get to this by Thursday, wont be able to change the
>>>>>> targetserver kernel until then.
>>>>>> Regards
>>>>>> Laurence
>>>>>>
>>>>> Same. I'll try this out closer to the weekend.
>>>>>
>>>>> Not an NMI motherboard. This is a 9-10 year old AMD board meant as
>>>>> a desktop or home server.
>>>>>
>>>>> I'll have to read more about the USB Serial port to capture further
>>>>> info. That's interesting.
>>>>>
>>>>> For the time being, I've disabled HPET in BIOS. ( Appears the
>>>>> kernel boot parameter method wasn't enough. )
>>>>>
>>>>>
>>>>
>>>> Hey Guy's,
>>>> Did some of what you suggested, including the USB serial setup:
>>>> 1) One of DB9 RS232 Serial Null Modem Cable F/F
>>>> 2) Two of USB to RS232 Serial Port DB9 9 Pin Male
>>>> however, when the kernel came down it took the USB support with it
>>>> and so minicom went offline:
>>>> CTRL-A Z for help |115200 8N1 | NOR | Minicom 2.6.2 | VT102 |
>>>> Offline
>>>> But I did enable full logging for the QLA module:
>>>> echo 0x7fffffff >
>>>> /sys/module/qla2xxx/parameters/ql2xextended_error_logging
>>>> Did all that, minus the Kernel v5.1-rc1 implementation, and this is
>>>> what was picked up from the minicom USB to Serial capture before
>>>> things went south:
>>>> 1235905 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
>>>>> bufflen=512, cmd->sg_cnt=1, cmd-
>>>> dma_data_directi
>>>> on=1
>>>> se_cmd[0000
>>>> 00009c9ea758]
>>>> qp
>>>> 0
>>>> 1235906 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
>>>>> bufflen=4096, cmd->sg_cnt=0,
>>>> cmd-
>>>>> dma_data_direct
>>>> ion=2
>>>> se_cmd[000
>>>> 0000096ae11b7]
>>>> q
>>>> p 0
>>>> 1235907 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
>>>>> bufflen=20480, cmd->sg_cnt=0,
>>>> cmd
>>>> ->dma_data_direc
>>>> tion=2
>>>> se_cmd[00
>>>> 0000001738f793]
>>>> qp 0
>>>> 1235908 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
>>>>> bufflen=20480, cmd->sg_cnt=0,
>>>> cmd
>>>> ->dma_data_direc
>>>> tion=2
>>>> se_cmd[00
>>>> 000000e8160a90]
>>>> qp 0
>>>> 1235909 ^MDetected MISCOMPARE for addr: 0000000033045258 buf:
>>>> 00000000f9849912
>>>> 1235910 ^MTarget/fileio: Send MISCOMPARE check condition and sense
>>>> 1235911 ^Mqla2xxx [0000:04:00.0]-e818: is_send_status=1, cmd-
>>>>> bufflen=512, cmd->sg_cnt=0, cmd-
>>>> dma_data_directi
>>>> on=2
>>>> se_cmd[0000
>>>> 0000363ae214]
>>>> qp
>>>> 0
>>>> 1235912 ^Mqla2xxx [0000:04:00.0]-e817: Skipping EXPLICIT_CONFORM and
>>>> CTIO7_FLAGS_CONFORM_REQ
>>>> fo
>>>> r FCP READ w/
>>>> no
>>>> n GOOD status
>>>> 1235913 ^Mqla2xxx [0000:04:00.0]-e874:2: qlt_free_cmd:
>>>> se_cmd[000000001db805fd] ox_id 00c8
>>>> 1235914 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
>>>> qla_target(0): type 6
>>>> ox_id
>>>> 00db
>>>> 1235915 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
>>>> qla_target(0): type 6
>>>> ox_id
>>>> 00dc
>>>> 1235916 ^Mqla2xxx [0000:04:00.0]-e874:2: qlt_free_cmd:
>>>> se_cmd[00000000f67a701f] ox_id 00c9
>>>> 1235917 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
>>>> qla_target(0): type 6
>>>> ox_id
>>>> 00dd
>>>> 1235918 ^Mqla2xxx [0000:04:00.0]-e872:2: qlt_24xx_atio_pkt_all_vps:
>>>> qla_target(0): type 6
>>>> ox_id
>>>> 00de
>>>>
>>>> On an earlier crash, captured the attached image. This time there
>>>> was nothing on the monitor and the keyboard didn't refresh it. No
>>>> signal.
>>>> When looking this up, closest I could see online is the following:
>>>>
>>> https://target-devel.vger.kernel.narkive.com/XiM5Csx8/luns-become-unavailable-with-current-git-head
>>>
>>>> They too run ESXi .
>>>> To read the file I used the AnsiEsc plugin for VIM:
>>>> https://www.vim.org/scripts/script.php?script_id=302
>>>> This started to occur once had a VMware based MySQL and PostgreSQL
>>>> cluster configured. Takes a few days for the issue to occur so from
>>>> that perspective, appears to be memory related.
>>>> Firmware that I'm using is:
>>>> supported_classes = "Class 3"
>>>> supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit"
>>>> symbolic_name = "QLE2464 FW:v8.04.00 DVR:v10.00.00.05-k"
>>>> Targetcli, rtslib and configshell versions I'm using are:
>>>>
>>>> # rpm -aq|grep -Ei "targetcli|rtslib|configshell"
>>>> python-rtslib-3.0.pre4.9~g6fd0bbf-1.el6.noarch
>>>> python-configshell-1.1.fb4-1.el6.noarch
>>>> targetcli-3.0.pre4.5~ga125182-1.el6.noarch
>>>>
>>>>
>>>> --
>>>> Thx,
>>>> TK.
>>>
>>> I missed this email, Been buried in customer cases.
>>> I also need to still run some tests.
>>> Sorry, reading now
>>>
>>
>> No worries. Would be very interested to see what you find.
>>
>> In the meantime later tonight I'll be trying to 1) find more recent
>> firmware for the card, 2) try to use the newer kernel v5.1-rc1
>> outlined above 3) try the 4.5 kernel later on the weekend.
>>
>>
>
>
> Trying with this F/W and driver:
>
>
> symbolic_name = "QLE2464 FW:v8.06.02 DVR:v10.00.00.07-k-debug"
>
> [root@mbpc-pc 05-03-2019]# strings /lib/firmware/ql2400_fw.bin|grep -Ei
> copyright
> COPYRIGHT 2016 QLOGIC CORPORATION ISP24xx Firmware Version 8.06.02 $
> [root@mbpc-pc 05-03-2019]#
>
>
>
> The latest 'boot' firmware I can find is the following:
>
>
>
> [root@mbpc-pc 05-03-2019]# strings Q24A7232.BIN |grep -Ei copyright
> Copyright (C) QLogic Corporation 1993-2015. All rights reserved.
> Copyright (C) QLogic Corporation 1993-2015. All rights reserved.
> COPYRIGHT 2015 QLOGIC CORPORATION ISP24xx Firmware Version 8.01.02 $
> You have new mail in /var/spool/mail/root
> [root@mbpc-pc 05-03-2019]#
>
> This is what shows up as loaded during system boot.
>
So resolved it a few months back. What I did is simply to upgrade the
memory on the storage server from 4GB to 8GB. Then later to 16GB.
Since the time I upgrade the memory, the kernel panic didn't show up and
everything's been stable.
--
Thx,
TK.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2019-09-29 17:04 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <012262e1-697c-577b-cf99-bbd960661c70@mdevsys.com>
[not found] ` <749372e4ebd5857ecc9b0947d6f8582a6f199bf4.camel@redhat.com>
[not found] ` <1555342788.161891.95.camel@acm.org>
[not found] ` <d6cef9727e9446101a9913651d58df60914c7b61.camel@redhat.com>
[not found] ` <fa7e07d1-5266-4f55-b3f5-f41a85e679d8@mdevsys.com>
[not found] ` <8ab70c91-9e11-4e8b-f4cf-d705bec1d4c1@mdevsys.com>
[not found] ` <156112a88d3bc8e2edc32253e4b19f62b6254580.camel@redhat.com>
[not found] ` <b6afdebe-f43a-2534-0b78-21c85dd3a8f8@mdevsys.com>
[not found] ` <003a89f7-7077-d9f1-75e5-bc0de6d6e038@mdevsys.com>
2019-09-29 17:04 ` Kernel panic w/ message request_threaded_irq -> qla2x00_request_irqs -> qla2x00_probe_one -> mod_timer TomK
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).