modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error
@ 2025-02-05 23:09 Mitchell Augustin
  2025-02-07 15:54 ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Mitchell Augustin @ 2025-02-05 23:09 UTC (permalink / raw)
  To: saeedm, leon, tariqt, andrew+netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-rdma,
	linux-kernel
  Cc: Talat Batheesh, Feras Daoud

Hello,

I have identified a bug in the mlx5_core module, or some related component.

Doing the following on a freshly provisioned Oracle Cloud bare metal
node with this configuration [0] will reliably cause the entire
instance to become unresponsive:

rmmod mlx5_ib; rmmod mlx5_core; modprobe mlx5_core

This also produces the following output:

[  331.267175] I/O error, dev sda, sector 35602992 op 0x0:(READ) flags
0x80700 phys_seg 33 prio class 0
[  331.376575] I/O error, dev sda, sector 35600432 op 0x0:(READ) flags
0x84700 phys_seg 320 prio class 0
[  331.487509] I/O error, dev sda, sector 35595064 op 0x0:(READ) flags
0x80700 phys_seg 159 prio class 0
[  528.386085] INFO: task kworker/u290:0:453 blocked for more than 122 seconds.
[  528.470497]       Not tainted 6.14.0-rc1 #1
[  528.520546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  528.615268] INFO: task kworker/u290:3:820 blocked for more than 123 seconds.
[  528.699641]       Not tainted 6.14.0-rc1 #1
[  528.749690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  528.843577] INFO: task jbd2/sda1-8:1128 blocked for more than 123 seconds.
[  528.925922]       Not tainted 6.14.0-rc1 #1
[  528.975971] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  529.069854] INFO: task systemd-journal:1218 blocked for more than
123 seconds.
[  529.156382]       Not tainted 6.14.0-rc1 #1
[  529.206441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  529.300407] INFO: task kworker/u290:4:1828 blocked for more than 123 seconds.
[  529.385892]       Not tainted 6.14.0-rc1 #1
[  529.435942] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  529.529973] INFO: task rs:main Q:Reg:2184 blocked for more than 124 seconds.
[  529.614607]       Not tainted 6.14.0-rc1 #1
[  529.664656] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  529.758690] INFO: task gomon:2258 blocked for more than 124 seconds.
[  529.834832]       Not tainted 6.14.0-rc1 #1
[  529.884887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  529.978867] INFO: task kworker/u290:5:3255 blocked for more than 124 seconds.
[  530.064351]       Not tainted 6.14.0-rc1 #1
[  530.114398] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  651.265588] INFO: task kworker/u290:0:453 blocked for more than 245 seconds.
[  651.349980]       Not tainted 6.14.0-rc1 #1
[  651.400028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  651.494126] INFO: task kworker/u290:3:820 blocked for more than 245 seconds.
[  651.578543]       Not tainted 6.14.0-rc1 #1
[  651.628600] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

I tried using the function_graph tracer to identify if there were any
functions within mlx5_core that were executing for an excessive amount
of time, but did not find anything conclusive.

Attached[1] is the stack trace that I see when I force the kernel to
panic once a hang has been detected. I did this 3 times, and each
trace was similar in that they all referred to ext4_* functions, which
seems to line up with the I/O errors that I see each time.

I should also note that I was able to trigger a similar I/O error on a
DGX A100 one time (running Ubuntu-6.8.0-52-generic kernel and modules
installed via a repackaged version of DOCA-OFED) - but I have not been
able to reliably reproduce this issue on that machine with the pure
upstream inbox drivers, like I can with the OCI instance. (I was also
still able to interact with the A100 - but attempting to run any
command resulted in a "command not found" error, which again lines up
with the idea that this might have been interfering with ext4 somehow)

Has anything like this been observed by other users?

Please let me know if there is anything else I should do or provide to
help debug this issue, or if there is already a known root cause.

[0]
System specs:
OCI bare-metal Node, BM.Optimized3.36 shape with RoCE connectivity to
another identical node
Kernel: mainline @ 6.14.0-rc1 with this config:
https://pastebin.ubuntu.com/p/5Jm2WFZY62/
ibstat output: https://pastebin.ubuntu.com/p/S5dfFSdDxd/
lscpu output: https://pastebin.ubuntu.com/p/dfPyYQWnhX/

[1]
https://pastebin.ubuntu.com/p/kxw2dsmwFV/

-- 
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error
  2025-02-05 23:09 modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error Mitchell Augustin
@ 2025-02-07 15:54 ` Jason Gunthorpe
  2025-02-07 16:02   ` Mitchell Augustin
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 15:54 UTC (permalink / raw)
  To: Mitchell Augustin
  Cc: saeedm, leon, tariqt, andrew+netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-rdma,
	linux-kernel, Talat Batheesh, Feras Daoud

On Wed, Feb 05, 2025 at 05:09:13PM -0600, Mitchell Augustin wrote:
> Hello,
> 
> I have identified a bug in the mlx5_core module, or some related component.
> 
> Doing the following on a freshly provisioned Oracle Cloud bare metal
> node with this configuration [0] will reliably cause the entire
> instance to become unresponsive:
> 
> rmmod mlx5_ib; rmmod mlx5_core; modprobe mlx5_core
> 
> This also produces the following output:
> 
> [  331.267175] I/O error, dev sda, sector 35602992 op 0x0:(READ) flags
> 0x80700 phys_seg 33 prio class 0

Is it using iscsi/srp/nfs/etc for any filesystems?

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error
  2025-02-07 15:54 ` Jason Gunthorpe
@ 2025-02-07 16:02   ` Mitchell Augustin
  2025-02-07 19:01     ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Mitchell Augustin @ 2025-02-07 16:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: saeedm, leon, tariqt, andrew+netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-rdma,
	linux-kernel, Talat Batheesh, Feras Daoud

> Is it using iscsi/srp/nfs/etc for any filesystems?

Yes, dev sda is using iSCSI:

ubuntu@inst-v4ovk-mitchell-instance-pool-20250205-1119:~$ sudo
iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
version 2.1.9
Target: iqn.2015-02.oracle.boot:uefi (non-flash)
Current Portal: 169.254.0.2:3260,1
Persistent Portal: 169.254.0.2:3260,1
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.2010-04.org.ipxe:080020ff-ffff-ffff-ffff-a8698c179e5c
Iface IPaddress: 10.0.0.254
Iface HWaddress: default
Iface Netdev: default
SID: 1
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 6000
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username: <empty>
password: ********
username_in: <empty>
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 8192
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 0 State: running
scsi0 Channel 00 Id 0 Lun: 0
scsi0 Channel 00 Id 0 Lun: 1
Attached scsi disk sda State: running



On Fri, Feb 7, 2025 at 9:55 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Feb 05, 2025 at 05:09:13PM -0600, Mitchell Augustin wrote:
> > Hello,
> >
> > I have identified a bug in the mlx5_core module, or some related component.
> >
> > Doing the following on a freshly provisioned Oracle Cloud bare metal
> > node with this configuration [0] will reliably cause the entire
> > instance to become unresponsive:
> >
> > rmmod mlx5_ib; rmmod mlx5_core; modprobe mlx5_core
> >
> > This also produces the following output:
> >
> > [  331.267175] I/O error, dev sda, sector 35602992 op 0x0:(READ) flags
> > 0x80700 phys_seg 33 prio class 0
>
> Is it using iscsi/srp/nfs/etc for any filesystems?
>
> Jason



-- 
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error
  2025-02-07 16:02   ` Mitchell Augustin
@ 2025-02-07 19:01     ` Jason Gunthorpe
  2025-02-07 19:24       ` Mitchell Augustin
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 19:01 UTC (permalink / raw)
  To: Mitchell Augustin
  Cc: saeedm, leon, tariqt, andrew+netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-rdma,
	linux-kernel, Talat Batheesh, Feras Daoud

On Fri, Feb 07, 2025 at 10:02:46AM -0600, Mitchell Augustin wrote:
> > Is it using iscsi/srp/nfs/etc for any filesystems?
> 
> Yes, dev sda is using iSCSI:

If you remove the driver that is providing transport for your
filesystem the system will hang like you showed.

It can be done, but the process sequencing the load/unload has to be
entirely contained to a tmpfs so it doesn't become blocked on IO that
cannot complete.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error
  2025-02-07 19:01     ` Jason Gunthorpe
@ 2025-02-07 19:24       ` Mitchell Augustin
  2025-02-07 19:33         ` Saeed Mahameed
  0 siblings, 1 reply; 6+ messages in thread
From: Mitchell Augustin @ 2025-02-07 19:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: saeedm, leon, tariqt, andrew+netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-rdma,
	linux-kernel, Talat Batheesh, Feras Daoud

*facepalm*

Thanks, I can't believe that wasn't my first thought as soon as I
learned these instances were using iSCSI. That's almost certainly what
is happening on this OCI instance, since the host adapter for its
iSCSI transport is a ConnectX card.

The fact that I was able to see similar behavior once on a machine
booted from a local disk (in the A100 test I mentioned) is still
confusing though. I'll update this thread if I can figure out a
reliable way to reproduce that behavior.


On Fri, Feb 7, 2025 at 1:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Feb 07, 2025 at 10:02:46AM -0600, Mitchell Augustin wrote:
> > > Is it using iscsi/srp/nfs/etc for any filesystems?
> >
> > Yes, dev sda is using iSCSI:
>
> If you remove the driver that is providing transport for your
> filesystem the system will hang like you showed.
>
> It can be done, but the process sequencing the load/unload has to be
> entirely contained to a tmpfs so it doesn't become blocked on IO that
> cannot complete.
>
> Jason



-- 
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error
  2025-02-07 19:24       ` Mitchell Augustin
@ 2025-02-07 19:33         ` Saeed Mahameed
  0 siblings, 0 replies; 6+ messages in thread
From: Saeed Mahameed @ 2025-02-07 19:33 UTC (permalink / raw)
  To: Mitchell Augustin
  Cc: Jason Gunthorpe, leon, tariqt, andrew+netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-rdma,
	linux-kernel, Talat Batheesh, Feras Daoud

On 07 Feb 13:24, Mitchell Augustin wrote:
>*facepalm*
>
>Thanks, I can't believe that wasn't my first thought as soon as I
>learned these instances were using iSCSI. That's almost certainly what
>is happening on this OCI instance, since the host adapter for its
>iSCSI transport is a ConnectX card.
>
>The fact that I was able to see similar behavior once on a machine
>booted from a local disk (in the A100 test I mentioned) is still
>confusing though. I'll update this thread if I can figure out a
>reliable way to reproduce that behavior.
>

BTW I saw this happening in few instances of virtualized environments as
well, where the VM storage is network/RDMA packed by the host driver/network,
when the host driver restarts (which is a normal behavior for a PV setup), 
the VM also get storage related timeouts and soft lockups.  Graceful
shutdown needs to be handled inside of the network backed block devices
IMHO.

-Saeed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-07 19:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-05 23:09 modprobe mlx5_core on OCI bare-metal instance causes unrecoverable hang and I/O error Mitchell Augustin
2025-02-07 15:54 ` Jason Gunthorpe
2025-02-07 16:02   ` Mitchell Augustin
2025-02-07 19:01     ` Jason Gunthorpe
2025-02-07 19:24       ` Mitchell Augustin
2025-02-07 19:33         ` Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).