* [Bug 187221] New: HPSA resetting logical / reset logical
@ 2016-11-07 13:39 bugzilla-daemon
2016-11-16 6:19 ` [Bug 187221] " bugzilla-daemon
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: bugzilla-daemon @ 2016-11-07 13:39 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=187221
Bug ID: 187221
Summary: HPSA resetting logical / reset logical
Product: IO/Storage
Version: 2.5
Kernel Version: 4.4.x, 4.8.x
Hardware: Intel
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: SCSI
Assignee: linux-scsi@vger.kernel.org
Reporter: kernelorg@bof.de
Regression: No
I have about 20 HP DL 380 (some 360) servers, from Gen7 to Gen9, using the HPSA
driver with various smartarray controllers.
For a long time I've been running mainline 3.14 kernels, without any issues.
Some time ago I updated to mainline 4.4.x, up to the most recent 4.4.30.
Now I noticed, especially on one server, but in the logs on 6 of them, the
following kind of message:
2016-11-06T22:09:50.227592+01:00 HOST kernel: [68853.338610] hpsa 0000:03:00.0:
scsi 0:1:0:0: resetting logical Direct-Access HP LOGICAL VOLUME
RAID-5 SSDSmartPathCap- En- Exp=1
2016-11-06T22:10:18.713759+01:00 HOST kernel: [68881.832436] hpsa 0000:03:00.0:
scsi 0:1:0:0: reset logical completed successfully Direct-Access HP
LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
I see such messages, _usually_ only with 1 second between resetting/reset, on
machines with the following controller+controller firmware variants:
1 P410i 5.14
1 P420i 5.42
2 P440ar 3.02
1 P440ar 3.56
1 P440ar 4.02
The one machine for which I've shown the concrete message, is a P440ar with
firmware 3.02. There, contrary to the other machines, it sometimes takes up to
20 seconds for that resetting operation, and meanwhile, all I/O stalls.
I also tested with 4.8.x kernels, and saw the same symptoms there. I'm somewhat
sure that I did not see these with 3.14 kernels. This morning I rebooted the
most problematic box to 3.14.79, so far it was silent. I'll report if that
changes.
Apart from these log lines, there is nothing strange to be found - no ILO or
IML notifications visible, no other kernel messages, no drive failures, SMART
alerts, or performance regressions...
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 187221] HPSA resetting logical / reset logical
2016-11-07 13:39 [Bug 187221] New: HPSA resetting logical / reset logical bugzilla-daemon
@ 2016-11-16 6:19 ` bugzilla-daemon
2021-03-29 2:18 ` bugzilla-daemon
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2016-11-16 6:19 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=187221
--- Comment #1 from Patrick Schaaf <kernelorg@bof.de> ---
Some more info on my problematic machine / further diagnosing is in
https://bugzilla.kernel.org/show_bug.cgi?id=187231
Summary: at least with the P440ar controllers, such 10-30 second "logical
reset" episodes eventually reveal an underlying faulty drive, and go away when
that is drive is replaced.
But there is no up-front information in the "logical reset" that would permit
pinpointing the drive on the first round.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 187221] HPSA resetting logical / reset logical
2016-11-07 13:39 [Bug 187221] New: HPSA resetting logical / reset logical bugzilla-daemon
2016-11-16 6:19 ` [Bug 187221] " bugzilla-daemon
@ 2021-03-29 2:18 ` bugzilla-daemon
2021-03-29 2:19 ` bugzilla-daemon
2021-03-29 2:20 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2021-03-29 2:18 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=187221
vsudo (vsudoblog@gmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vsudoblog@gmail.com
--- Comment #2 from vsudo (vsudoblog@gmail.com) ---
I have a HPE Gen9 using RAI5, I get this error and distributed on this server
is resynced and I don't understand the reason why OS doesn't accept and failure
disk in RAID5 is normal.
[Sun Mar 28 09:47:04 2021] hpsa 0000:03:00.0: device is ready.
[Sun Mar 28 09:47:04 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
completed successfully Direct-Access HP LOGICAL VO
LUME RAID-5 SSDSmartPathCap- En- Exp=1
[Sun Mar 28 09:54:54 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting logical
Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
artPathCap- En- Exp=1
[Sun Mar 28 09:55:50 2021] hpsa 0000:03:00.0: device is ready.
[Sun Mar 28 09:55:50 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
completed successfully Direct-Access HP LOGICAL VO
LUME RAID-5 SSDSmartPathCap- En- Exp=1
[Sun Mar 28 09:56:35 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting logical
Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
artPathCap- En- Exp=1
[Sun Mar 28 09:57:01 2021] hpsa 0000:03:00.0: device is ready.
[Sun Mar 28 09:57:01 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
completed successfully Direct-Access HP LOGICAL VO
LUME RAID-5 SSDSmartPathCap- En- Exp=1
[Sun Mar 28 10:00:56 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting logical
Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
artPathCap- En- Exp=1
[Sun Mar 28 10:00:57 2021] hpsa 0000:03:00.0: device is ready.
[Sun Mar 28 10:00:57 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
completed successfully Direct-Access HP LOGICAL VO
LUME RAID-5 SSDSmartPathCap- En- Exp=1
[Sun Mar 28 10:05:13 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting logical
Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
artPathCap- En- Exp=1
[Sun Mar 28 10:07:16 2021] INFO: task scsi_eh_0:473 blocked for more than 120
seconds.
[Sun Mar 28 10:07:16 2021] Not tainted 4.15.0-55-generic #60-Ubuntu
[Sun Mar 28 10:07:16 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Sun Mar 28 10:07:16 2021] scsi_eh_0 D 0 473 2 0x80000000
[Sun Mar 28 10:07:16 2021] Call Trace:
[Sun Mar 28 10:07:16 2021] __schedule+0x291/0x8a0
[Sun Mar 28 10:07:16 2021] ? dev_printk_emit+0x4a/0x70
[Sun Mar 28 10:07:16 2021] schedule+0x2c/0x80
[Sun Mar 28 10:07:16 2021] schedule_timeout+0x1cf/0x350
[Sun Mar 28 10:07:16 2021] ? __dev_printk+0x3c/0x80
[Sun Mar 28 10:07:16 2021] ? dev_printk+0x56/0x80
[Sun Mar 28 10:07:16 2021] io_schedule_timeout+0x1e/0x50
[Sun Mar 28 10:07:16 2021] wait_for_completion_io+0xba/0x140
[Sun Mar 28 10:07:16 2021] ? wake_up_q+0x80/0x80
[Sun Mar 28 10:07:16 2021] hpsa_scsi_do_simple_cmd.isra.60+0xb3/0xf0 [hpsa]
[Sun Mar 28 10:07:16 2021] hpsa_eh_device_reset_handler+0x3e4/0x7b0 [hpsa]
[Sun Mar 28 10:07:16 2021] ? __switch_to_asm+0x40/0x70
[Sun Mar 28 10:07:16 2021] ? __switch_to_asm+0x34/0x70
[Sun Mar 28 10:07:16 2021] ? scsi_device_put+0x2b/0x30
[Sun Mar 28 10:07:16 2021] scsi_eh_ready_devs+0x333/0xbf0
[Sun Mar 28 10:07:16 2021] ? __pm_runtime_resume+0x5b/0x80
[Sun Mar 28 10:07:16 2021] ? scsi_try_target_reset+0x90/0x90
[Sun Mar 28 10:07:16 2021] scsi_error_handler+0x4c3/0x5b0
[Sun Mar 28 10:07:16 2021] kthread+0x121/0x140
[Sun Mar 28 10:07:16 2021] ? scsi_eh_get_sense+0x200/0x200
[Sun Mar 28 10:07:16 2021] ? kthread_create_worker_on_cpu+0x70/0x70
[Sun Mar 28 10:07:16 2021] ret_from_fork+0x35/0x40
[Sun Mar 28 10:07:16 2021] INFO: task systemd-journal:109135 blocked for more
than 120 seconds.
[Sun Mar 28 10:07:16 2021] Not tainted 4.15.0-55-generic #60-Ubuntu
[Sun Mar 28 10:07:16 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Sun Mar 28 10:07:16 2021] systemd-journal D 0 109135 1 0x00000320
[Sun Mar 28 10:07:16 2021] Call Trace:
[Sun Mar 28 10:07:16 2021] __schedule+0x291/0x8a0
[Sun Mar 28 10:07:16 2021] ? intel_pstate_update_pstate+0x40/0x40
My website: https://vsudo.blog
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 187221] HPSA resetting logical / reset logical
2016-11-07 13:39 [Bug 187221] New: HPSA resetting logical / reset logical bugzilla-daemon
2016-11-16 6:19 ` [Bug 187221] " bugzilla-daemon
2021-03-29 2:18 ` bugzilla-daemon
@ 2021-03-29 2:19 ` bugzilla-daemon
2021-03-29 2:20 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2021-03-29 2:19 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=187221
--- Comment #3 from vsudo (vsudoblog@gmail.com) ---
Update my website is: https://vsudo.net
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 187221] HPSA resetting logical / reset logical
2016-11-07 13:39 [Bug 187221] New: HPSA resetting logical / reset logical bugzilla-daemon
` (2 preceding siblings ...)
2021-03-29 2:19 ` bugzilla-daemon
@ 2021-03-29 2:20 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2021-03-29 2:20 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=187221
--- Comment #4 from vsudo (vsudoblog@gmail.com) ---
(In reply to vsudo from comment #2)
> I have a HPE Gen9 using RAI5, I get this error and distributed on this
> server is resynced and I don't understand the reason why OS doesn't accept
> and failure disk in RAID5 is normal.
>
> [Sun Mar 28 09:47:04 2021] hpsa 0000:03:00.0: device is ready.
> [Sun Mar 28 09:47:04 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
> completed successfully Direct-Access HP LOGICAL VO
> LUME RAID-5 SSDSmartPathCap- En- Exp=1
> [Sun Mar 28 09:54:54 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting
> logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
> artPathCap- En- Exp=1
> [Sun Mar 28 09:55:50 2021] hpsa 0000:03:00.0: device is ready.
> [Sun Mar 28 09:55:50 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
> completed successfully Direct-Access HP LOGICAL VO
> LUME RAID-5 SSDSmartPathCap- En- Exp=1
> [Sun Mar 28 09:56:35 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting
> logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
> artPathCap- En- Exp=1
> [Sun Mar 28 09:57:01 2021] hpsa 0000:03:00.0: device is ready.
> [Sun Mar 28 09:57:01 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
> completed successfully Direct-Access HP LOGICAL VO
> LUME RAID-5 SSDSmartPathCap- En- Exp=1
> [Sun Mar 28 10:00:56 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting
> logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
> artPathCap- En- Exp=1
> [Sun Mar 28 10:00:57 2021] hpsa 0000:03:00.0: device is ready.
> [Sun Mar 28 10:00:57 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: reset logical
> completed successfully Direct-Access HP LOGICAL VO
> LUME RAID-5 SSDSmartPathCap- En- Exp=1
> [Sun Mar 28 10:05:13 2021] hpsa 0000:03:00.0: scsi 0:1:0:4: resetting
> logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSm
> artPathCap- En- Exp=1
> [Sun Mar 28 10:07:16 2021] INFO: task scsi_eh_0:473 blocked for more than
> 120 seconds.
> [Sun Mar 28 10:07:16 2021] Not tainted 4.15.0-55-generic #60-Ubuntu
> [Sun Mar 28 10:07:16 2021] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Sun Mar 28 10:07:16 2021] scsi_eh_0 D 0 473 2 0x80000000
> [Sun Mar 28 10:07:16 2021] Call Trace:
> [Sun Mar 28 10:07:16 2021] __schedule+0x291/0x8a0
> [Sun Mar 28 10:07:16 2021] ? dev_printk_emit+0x4a/0x70
> [Sun Mar 28 10:07:16 2021] schedule+0x2c/0x80
> [Sun Mar 28 10:07:16 2021] schedule_timeout+0x1cf/0x350
> [Sun Mar 28 10:07:16 2021] ? __dev_printk+0x3c/0x80
> [Sun Mar 28 10:07:16 2021] ? dev_printk+0x56/0x80
> [Sun Mar 28 10:07:16 2021] io_schedule_timeout+0x1e/0x50
> [Sun Mar 28 10:07:16 2021] wait_for_completion_io+0xba/0x140
> [Sun Mar 28 10:07:16 2021] ? wake_up_q+0x80/0x80
> [Sun Mar 28 10:07:16 2021] hpsa_scsi_do_simple_cmd.isra.60+0xb3/0xf0 [hpsa]
> [Sun Mar 28 10:07:16 2021] hpsa_eh_device_reset_handler+0x3e4/0x7b0 [hpsa]
> [Sun Mar 28 10:07:16 2021] ? __switch_to_asm+0x40/0x70
> [Sun Mar 28 10:07:16 2021] ? __switch_to_asm+0x34/0x70
> [Sun Mar 28 10:07:16 2021] ? scsi_device_put+0x2b/0x30
> [Sun Mar 28 10:07:16 2021] scsi_eh_ready_devs+0x333/0xbf0
> [Sun Mar 28 10:07:16 2021] ? __pm_runtime_resume+0x5b/0x80
> [Sun Mar 28 10:07:16 2021] ? scsi_try_target_reset+0x90/0x90
> [Sun Mar 28 10:07:16 2021] scsi_error_handler+0x4c3/0x5b0
> [Sun Mar 28 10:07:16 2021] kthread+0x121/0x140
> [Sun Mar 28 10:07:16 2021] ? scsi_eh_get_sense+0x200/0x200
> [Sun Mar 28 10:07:16 2021] ? kthread_create_worker_on_cpu+0x70/0x70
> [Sun Mar 28 10:07:16 2021] ret_from_fork+0x35/0x40
> [Sun Mar 28 10:07:16 2021] INFO: task systemd-journal:109135 blocked for
> more than 120 seconds.
> [Sun Mar 28 10:07:16 2021] Not tainted 4.15.0-55-generic #60-Ubuntu
> [Sun Mar 28 10:07:16 2021] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Sun Mar 28 10:07:16 2021] systemd-journal D 0 109135 1 0x00000320
> [Sun Mar 28 10:07:16 2021] Call Trace:
> [Sun Mar 28 10:07:16 2021] __schedule+0x291/0x8a0
> [Sun Mar 28 10:07:16 2021] ? intel_pstate_update_pstate+0x40/0x40
>
> My website: https://vsudo.net
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-29 2:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-07 13:39 [Bug 187221] New: HPSA resetting logical / reset logical bugzilla-daemon
2016-11-16 6:19 ` [Bug 187221] " bugzilla-daemon
2021-03-29 2:18 ` bugzilla-daemon
2021-03-29 2:19 ` bugzilla-daemon
2021-03-29 2:20 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).