Re: Kernel panic triggered while removing mlx5

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Kernel panic triggered while removing mlx5_core devices from the pci bus
       [not found]   ` <PH0PR11MB515990257791E24CF6E0E51CE6F12@PH0PR11MB5159.namprd11.prod.outlook.com>
@ 2024-05-30  8:09     ` Shay Drori
  2024-06-04  9:44       ` Berger, Michal
  0 siblings, 1 reply; 2+ messages in thread
From: Shay Drori @ 2024-05-30  8:09 UTC (permalink / raw)
  To: Berger, Michal, netdev@vger.kernel.org, moshe@nvidia.com,
	linux-rdma@vger.kernel.org, phaddad

Hi Michal

can you please try the bellow change[1]?
In addition, the bug/trace is in mlx5_ib driver code, so I CC rdma ML
(linux-rdma@vger.kernel.org).

thanks
Shay Drory

[1]
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -614,7 +614,6 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, int 
num_entries, struct ib_wc *wc)
         int soft_polled = 0;
         int npolled;
-       spin_lock_irqsave(&cq->lock, flags);
         if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) {
                 /* make sure no soft wqe's are waiting */
                 if (unlikely(!list_empty(&cq->wc_list)))
@@ -625,6 +624,7 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, int 
num_entries, struct ib_wc *wc)
                 goto out;
         }
+       spin_lock_irqsave(&cq->lock, flags);
         if (unlikely(!list_empty(&cq->wc_list)))
                 soft_polled = poll_soft_wc(cq, num_entries, wc, false);
@@ -635,9 +635,9 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, int 
num_entries, struct ib_wc *wc)
         if (npolled)
                 mlx5_cq_set_ci(&cq->mcq);
-out:
         spin_unlock_irqrestore(&cq->lock, flags);
+out:
         return soft_polled + npolled;
}


On 28/05/2024 9:18, Berger, Michal wrote:
> *External email: Use caution opening links or attachments*
> 
> 
> Hi Shay,
> 
> Appreciate your feedback. I applied the suggested change on top of our 
> 6.8.9 kernel build, but I am afraid it didn't solve the problem. 
> Granted, the stacktrace doesn't point at the mlx5_health* anymore, but 
> the panic happens exactly at the same time - it takes couple dozen of 
> tries to trigger it, but it's still there. Attaching latest trace.
> 
> Michal Berger
> 
> 
> Intel Technology Poland sp. z o.o. - ul. Slowackiego 173, 80-298 Gdansk
> 
> KRS 101882
> 
> NIP 957-07-52-316
> 
> ------------------------------------------------------------------------
> *From:* Shay Drori <shayd@nvidia.com>
> *Sent:* Sunday, May 26, 2024 2:35 PM
> *To:* Berger, Michal <michal.berger@intel.com>; netdev@vger.kernel.org 
> <netdev@vger.kernel.org>; moshe@nvidia.com <moshe@nvidia.com>
> *Subject:* Re: Kernel panic triggered while removing mlx5_core devices 
> from the pci bus
> Hi Michal.
> 
> can you please try the bellow change[1]?
> we try it locally and it seems to solve the issue.
> 
> thanks
> Shay Drory
> 
> [1]
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 6574c145dc1e..459a836a5d9c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1298,6 +1298,9 @@ static int mlx5_function_teardown(struct
> mlx5_core_dev *dev, bool boot)
>           if (!err)
>                   mlx5_function_disable(dev, boot);
> +       else
> +               mlx5_stop_health_poll(dev, boot);
> +
>           return err;
> }
> 
> 
> 
> On 24/05/2024 11:07, Berger, Michal wrote:
>> Kernel: 6.7.0, 6.8.8 (fedora builds)
>> Devices: MT27710 Family [ConnectX-4 Lx] (0x1015), fw_ver: 14.23.1020
>> rdma-core: 44.0
>> 
>> We have a small test which performs a somewhat controlled hotplug of the net device on the pci bus (via sysfs). The affected device is part of the nvmf-rdma setup running in SPDK context (i.e. https://github.com/spdk/spdk/blob/master/test/nvmf/target/device_removal.sh) <https://github.com/spdk/spdk/blob/master/test/nvmf/target/device_removal.sh)>  Sometimes (it's not reproducible at each run unfortunately) when the device is removed, kernel hits
>> Oops - with our panic setup it's then followed by a kernel reboot, but if we allow the kernel to continue it eventually deadlocks itself.
>> 
>> This happens across different systems using the same set of NICs. Example of these oops attached.
>> 
>> Just to note, we previously had the same issue under older kernels (e.g. 6.1), all reported here  https://bugzilla.kernel.org/show_bug.cgi?id=218288 
> <https://bugzilla.kernel.org/show_bug.cgi?id=218288>. Bump to 6.7.0 
> helped to reduce the frequency
>> of this issue but unfortunately it's still there.
>> 
>> Any hints on how to tackle this issue would be appreciated.
>> 
>> Regards,
>> Michal
>> ---------------------------------------------------------------------
>> Intel Technology Poland sp. z o.o.
>> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
>> Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
>> 
>> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie  jest zabronione.
>> This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Kernel panic triggered while removing mlx5_core devices from the pci bus
  2024-05-30  8:09     ` Kernel panic triggered while removing mlx5_core devices from the pci bus Shay Drori
@ 2024-06-04  9:44       ` Berger, Michal
  0 siblings, 0 replies; 2+ messages in thread
From: Berger, Michal @ 2024-06-04  9:44 UTC (permalink / raw)
  To: Shay Drori, netdev@vger.kernel.org, moshe@nvidia.com,
	linux-rdma@vger.kernel.org, phaddad@nvidia.com


[-- Attachment #1.1: Type: text/plain, Size: 5948 bytes --]


Hi Shay,

I am afraid that the suggested change didn't help (I applied both patches now on top of 6.8.9), kernel still crashes with very similar (if not the same) traces, please see attached.

Regards,
Michal



________________________________
From: Shay Drori <shayd@nvidia.com>
Sent: Thursday, May 30, 2024 10:09 AM
To: Berger, Michal <michal.berger@intel.com>; netdev@vger.kernel.org <netdev@vger.kernel.org>; moshe@nvidia.com <moshe@nvidia.com>; linux-rdma@vger.kernel.org <linux-rdma@vger.kernel.org>; phaddad@nvidia.com <phaddad@nvidia.com>
Subject: Re: Kernel panic triggered while removing mlx5_core devices from the pci bus

Hi Michal

can you please try the bellow change[1]?
In addition, the bug/trace is in mlx5_ib driver code, so I CC rdma ML
(linux-rdma@vger.kernel.org).

thanks
Shay Drory

[1]
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -614,7 +614,6 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, int
num_entries, struct ib_wc *wc)
         int soft_polled = 0;
         int npolled;
-       spin_lock_irqsave(&cq->lock, flags);
         if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) {
                 /* make sure no soft wqe's are waiting */
                 if (unlikely(!list_empty(&cq->wc_list)))
@@ -625,6 +624,7 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, int
num_entries, struct ib_wc *wc)
                 goto out;
         }
+       spin_lock_irqsave(&cq->lock, flags);
         if (unlikely(!list_empty(&cq->wc_list)))
                 soft_polled = poll_soft_wc(cq, num_entries, wc, false);
@@ -635,9 +635,9 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, int
num_entries, struct ib_wc *wc)
         if (npolled)
                 mlx5_cq_set_ci(&cq->mcq);
-out:
         spin_unlock_irqrestore(&cq->lock, flags);
+out:
         return soft_polled + npolled;
}


On 28/05/2024 9:18, Berger, Michal wrote:
> *External email: Use caution opening links or attachments*
>
>
> Hi Shay,
>
> Appreciate your feedback. I applied the suggested change on top of our
> 6.8.9 kernel build, but I am afraid it didn't solve the problem.
> Granted, the stacktrace doesn't point at the mlx5_health* anymore, but
> the panic happens exactly at the same time - it takes couple dozen of
> tries to trigger it, but it's still there. Attaching latest trace.
>
> Michal Berger
>
>
> Intel Technology Poland sp. z o.o. - ul. Slowackiego 173, 80-298 Gdansk
>
> KRS 101882
>
> NIP 957-07-52-316
>
> ------------------------------------------------------------------------
> *From:* Shay Drori <shayd@nvidia.com>
> *Sent:* Sunday, May 26, 2024 2:35 PM
> *To:* Berger, Michal <michal.berger@intel.com>; netdev@vger.kernel.org
> <netdev@vger.kernel.org>; moshe@nvidia.com <moshe@nvidia.com>
> *Subject:* Re: Kernel panic triggered while removing mlx5_core devices
> from the pci bus
> Hi Michal.
>
> can you please try the bellow change[1]?
> we try it locally and it seems to solve the issue.
>
> thanks
> Shay Drory
>
> [1]
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 6574c145dc1e..459a836a5d9c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1298,6 +1298,9 @@ static int mlx5_function_teardown(struct
> mlx5_core_dev *dev, bool boot)
>           if (!err)
>                   mlx5_function_disable(dev, boot);
> +       else
> +               mlx5_stop_health_poll(dev, boot);
> +
>           return err;
> }
>
>
>
> On 24/05/2024 11:07, Berger, Michal wrote:
>> Kernel: 6.7.0, 6.8.8 (fedora builds)
>> Devices: MT27710 Family [ConnectX-4 Lx] (0x1015), fw_ver: 14.23.1020
>> rdma-core: 44.0
>>
>> We have a small test which performs a somewhat controlled hotplug of the net device on the pci bus (via sysfs). The affected device is part of the nvmf-rdma setup running in SPDK context (i.e. https://github.com/spdk/spdk/blob/master/test/nvmf/target/device_removal.sh) <https://github.com/spdk/spdk/blob/master/test/nvmf/target/device_removal.sh)>  Sometimes (it's not reproducible at each run unfortunately) when the device is removed, kernel hits
>> Oops - with our panic setup it's then followed by a kernel reboot, but if we allow the kernel to continue it eventually deadlocks itself.
>>
>> This happens across different systems using the same set of NICs. Example of these oops attached.
>>
>> Just to note, we previously had the same issue under older kernels (e.g. 6.1), all reported here  https://bugzilla.kernel.org/show_bug.cgi?id=218288
> <https://bugzilla.kernel.org/show_bug.cgi?id=218288>. Bump to 6.7.0
> helped to reduce the frequency
>> of this issue but unfortunately it's still there.
>>
>> Any hints on how to tackle this issue would be appreciated.
>>
>> Regards,
>> Michal
>> ---------------------------------------------------------------------
>> Intel Technology Poland sp. z o.o.
>> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
>> Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
>>
>> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie  jest zabronione.
>> This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.

[-- Attachment #1.2: Type: text/html, Size: 10118 bytes --]

[-- Attachment #2: oops1.log --]
[-- Type: application/octet-stream, Size: 8647 bytes --]

2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.151247] mlx5_core 0000:82:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.181769] mlx5_core 0000:82:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.216141] BUG: unable to handle page fault for address: ffffffffabe20660
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.224077] #PF: supervisor write access in kernel mode
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.230147] #PF: error_code(0x0002) - not-present page
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.236123] PGD 7e342d067 P4D 7e342d067 PUD 7e342e063 PMD 800ffff81c1ff062 
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.244145] Oops: 0002 [#1] PREEMPT SMP PTI
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.249085] CPU: 14 PID: 381 Comm: kworker/u85:0 Tainted: G           OE      6.8.9-200.fc39.x86_64 #1
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.259722] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.0006.032420170950 03/24/2017
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.271426] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
2024-06-04T11:25:54+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.278333] RIP: 0010:native_queued_spin_lock_slowpath+0x27f/0x2d0
2024-06-04T11:25:55+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.285483] Code: 41 89 d6 44 0f b7 e8 41 83 ee 01 49 c1 e5 05 4d 63 f6 49 81 c5 00 56 03 00 49 81 fe 00 20 00 00 73 45 4e 03 2c f5 a0 3c c0 aa <49> 89 6d 00 8b 45 08 85 c0 75 09 f3 90 8b 45 08 85 c0 74 f7 48 8b
2024-06-04T11:25:55+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.306972] RSP: 0018:ffffc042c7ebbd48 EFLAGS: 00010086
2024-06-04T11:25:55+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.313079] RAX: 0000000000000003 RBX: ffff9e89fa7a9e00 RCX: 0000000000000010
2024-06-04T11:25:55+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.321315] RDX: 000000000000101f RSI: 00000000407fa9c8 RDI: ffff9e89fa7a9e00
2024-06-04T11:25:55+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.329555] RBP: ffff9e89fbe35600 R08: 2c6f6c6e622c6168 R09: ffff9e89fa58d640
2024-06-04T11:25:55+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.337981] R10: 000000000000000f R11: fefefefefefefeff R12: 00000000003c0000
2024-06-04T11:25:55+02:00	Jun  4 09:25:54 10.211.164.109 [ 2993.346246] R13: ffffffffabe20660 R14: 000000000000101e R15: ffff9e89fa7a9e00
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.354502] FS:  0000000000000000(0000) GS:ffff9e89fbe00000(0000) knlGS:0000000000000000
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.363878] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.370644] CR2: ffffffffabe20660 CR3: 00000007e3428002 CR4: 00000000001706f0
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.378967] Call Trace:
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.382036]  <TASK>
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.384729]  ? __die+0x23/0x70
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.388503]  ? page_fault_oops+0x171/0x4f0
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.393434]  ? exc_page_fault+0x175/0x180
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.398276]  ? asm_exc_page_fault+0x26/0x30
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.403309]  ? native_queued_spin_lock_slowpath+0x27f/0x2d0
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.409878]  _raw_spin_lock_irqsave+0x3d/0x50
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.415087]  mlx5_ib_poll_cq+0x5d/0xe40 [mlx5_ib]
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.420754]  ? finish_task_switch.isra.0+0x94/0x2f0
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.426575]  __ib_process_cq+0x4f/0x180 [ib_core]
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.432246]  ib_cq_poll_work+0x2a/0x80 [ib_core]
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.437811]  process_one_work+0x176/0x340
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.442674]  worker_thread+0x27b/0x3a0
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.447246]  ? __pfx_worker_thread+0x10/0x10
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.452402]  kthread+0xe8/0x120
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.456311]  ? __pfx_kthread+0x10/0x10
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.460876]  ret_from_fork+0x34/0x50
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.465271]  ? __pfx_kthread+0x10/0x10
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.469849]  ret_from_fork_asm+0x1b/0x30
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.474636]  </TASK>
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.477477] Modules linked in: nvme_rdma nvme_fabrics vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd rdma_ucm rdma_cm iw_cm ib_umad ib_cm usdm_drv(OE) intel_qat(OE) rfkill uio sunrpc binfmt_misc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_ib kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 i40e ipmi_si rapl iTCO_wdt ib_uverbs macsec pktcdvd mei_me ipmi_devintf intel_cstate intel_pmc_bxt iTCO_vendor_support libsas i2c_i801 ib_core mei mgag200 intel_uncore ipmi_msghandler dax_pmem pcspkr scsi_transport_sas ioatdma lpc_ich i2c_smbus wmi joydev ip6_tables ip_tables fuse zram bpf_preload loop overlay squashfs netconsole nd_pmem nd_btt nd_e820 libnvdimm virtio_blk virtio_net net_failover failover uas usb_storage nvme nvme_core nvme_auth mlx5_core mlxfw psample tls pci_hyperv_intf ice(OE) gnss ixgbe mdio igb i2c_algo_bit dca [last
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 unloaded: nvme_fabrics]
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.580609] CR2: ffffffffabe20660
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.584783] ---[ end trace 0000000000000000 ]---
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.676597] RIP: 0010:native_queued_spin_lock_slowpath+0x27f/0x2d0
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.683969] Code: 41 89 d6 44 0f b7 e8 41 83 ee 01 49 c1 e5 05 4d 63 f6 49 81 c5 00 56 03 00 49 81 fe 00 20 00 00 73 45 4e 03 2c f5 a0 3c c0 aa <49> 89 6d 00 8b 45 08 85 c0 75 09 f3 90 8b 45 08 85 c0 74 f7 48 8b
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.705873] RSP: 0018:ffffc042c7ebbd48 EFLAGS: 00010086
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.712167] RAX: 0000000000000003 RBX: ffff9e89fa7a9e00 RCX: 0000000000000010
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.720614] RDX: 000000000000101f RSI: 00000000407fa9c8 RDI: ffff9e89fa7a9e00
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.729045] RBP: ffff9e89fbe35600 R08: 2c6f6c6e622c6168 R09: ffff9e89fa58d640
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.737470] R10: 000000000000000f R11: fefefefefefefeff R12: 00000000003c0000
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.745869] R13: ffffffffabe20660 R14: 000000000000101e R15: ffff9e89fa7a9e00
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.754290] FS:  0000000000000000(0000) GS:ffff9e89fbe00000(0000) knlGS:0000000000000000
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.763786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.770658] CR2: ffffffffabe20660 CR3: 00000007e3428002 CR4: 00000000001706f0
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.779080] Kernel panic - not syncing: Fatal exception
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.785433] Kernel Offset: 0x28000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.842155] pstore: backend (erst) writing error (-28)
2024-06-04T11:25:55+02:00	Jun  4 09:25:55 10.211.164.109 [ 2993.848273] Rebooting in 5 seconds..

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-06-04  9:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <PH0PR11MB515991D1E1AB73AFB7DCBD03E6F52@PH0PR11MB5159.namprd11.prod.outlook.com>
     [not found] ` <c51bef25-e8c5-492d-bb80-965b7f8542f7@nvidia.com>
     [not found]   ` <PH0PR11MB515990257791E24CF6E0E51CE6F12@PH0PR11MB5159.namprd11.prod.outlook.com>
2024-05-30  8:09     ` Kernel panic triggered while removing mlx5_core devices from the pci bus Shay Drori
2024-06-04  9:44       ` Berger, Michal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox