public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Shay Drori <shayd@nvidia.com>
To: Wang Yugui <wangyugui@e16-tech.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <linux-kernel@vger.kernel.org>, <akpm@linux-foundation.org>,
	<torvalds@linux-foundation.org>, <stable@vger.kernel.org>,
	<lwn@lwn.net>, <jslaby@suse.cz>
Subject: Re: Linux 6.6.28
Date: Sat, 20 Apr 2024 21:28:33 +0300	[thread overview]
Message-ID: <220e55df-a0a2-4272-b94f-c7c4c6fbf2b7@nvidia.com> (raw)
In-Reply-To: <20240420135914.2AD9.409509F4@e16-tech.com>



On 20/04/2024 8:59, Wang Yugui wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> 
>> I'm announcing the release of the 6.6.28 kernel.
>>
>> All users of the 6.6 kernel series must upgrade.
>>
>> The updated 6.6.y git tree can be found at:
>>        git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y
>> and can be browsed at the normal kernel.org git web browser:
>>        https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
> 
> Linux 6.6.28 failed to boot with the following panic *1 on a server with
> mellonax CX-6 VPI NIC, but 6.6.27/6.1.87 boot well.
> 
> After reverting 'net/mlx5: Restore mistakenly dropped parts in register devlink
> flow', linux boot well.
> 

there is a similar discussion in net-dev ML[1].
In short, it seems this patch is missing from stable, which is 
prerequisite for the bad patch:
0553e753ea9e
"net/mlx5: E-switch, store eswitch pointer before registering 
devlink_param".

Wang, can you test it out please?

thanks
Shay

[1]
https://lore.kernel.org/netdev/20240419162842.69433-1-oxana@cloudflare.com/T/#m9a8dd7f2e76d805baf2ea441137928a4dc6a11a7 


> There is already a patch(*2 ) in upstream, but yet not in queue-6.6(for the
> coming 6.6.29).
> 
> 
> *1 panic info:
> [   15.114364] BUG: unable to handle page fault for address: 0000000000001118
> [   15.114815] infiniband bnxt_re0: Device registered with IB successfully
> [   15.114822] #PF: supervisor read access in kernel mode
> [   15.134119] #PF: error_code(0x0000) - not-present page
> [   15.139652] PGD 0 P4D 0
> [   15.142553] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [   15.143055] infiniband bnxt_re1: Device registered with IB successfully
> [   15.147233] CPU: 1 PID: 1253 Comm: kworker/1:4 Not tainted 6.6.28-1.el7.x86_64 #1
> [   15.147236] Hardware name: Dell Inc. PowerEdge T640/0TWW5Y, BIOS 2.21.0 12/11/2023
> [   15.147238] Workqueue: events work_for_cpu_fn
> [   15.174498] RIP: 0010:esw_port_metadata_get+0x19/0x30 [mlx5_core]
> [   15.181056] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 d3 e8 ce 28 9a cc 48 8b 80 b0 09 00 00 <8b> 80 18 11 00 00 88 03 31 c0 80 23 01 5b c3 cc cc cc cc 0f 1f 40
> [   15.200401] RSP: 0000:ffff9ec05bf1fb98 EFLAGS: 00010286
> [   15.205930] RAX: 0000000000000000 RBX: ffff9ec05bf1fbe4 RCX: 0000000000000028
> [   15.213364] RDX: ffff9ec05bf1fbe4 RSI: 0000000000000013 RDI: ffff8bdd1d696000
> [   15.220801] RBP: ffffffffc1134c60 R08: 0000000000000000 R09: 0000000000000000
> [   15.228235] R10: ffff9ec05bf1fbf8 R11: 0000000000001000 R12: ffff8bdd1d696000
> [   15.235671] R13: ffff8bdd9541c720 R14: 0000000000000000 R15: 0000000000000000
> [   15.243098] FS:  0000000000000000(0000) GS:ffff8c3b7ea00000(0000) knlGS:0000000000000000
> [   15.251480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   15.257520] CR2: 0000000000001118 CR3: 00000004f9220003 CR4: 00000000007706e0
> [   15.264955] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   15.272383] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   15.279800] PKRU: 55555554
> [   15.282790] Call Trace:
> [   15.285523]  <TASK>
> [   15.287905]  ? __die_body+0x1e/0x60
> [   15.291675]  ? page_fault_oops+0x151/0x490
> [   15.296050]  ? __update_idle_core+0x27/0xc0
> [   15.300505]  ? exc_page_fault+0x6b/0x150
> [   15.304700]  ? asm_exc_page_fault+0x26/0x30
> [   15.309149]  ? esw_port_metadata_get+0x19/0x30 [mlx5_core]
> [   15.315066]  ? esw_port_metadata_get+0x12/0x30 [mlx5_core]
> [   15.320940]  devlink_nl_param_fill.constprop.23+0x88/0x5d0
> [   15.326679]  ? __alloc_skb+0x87/0x190
> [   15.330594]  ? __kmalloc_node_track_caller+0x55/0x130
> [   15.335897]  ? __kmalloc_node_track_caller+0x55/0x130
> [   15.341196]  ? kmalloc_reserve+0x65/0xf0
> [   15.345370]  ? __alloc_skb+0xd9/0x190
> [   15.349280]  devlink_param_notify.constprop.20+0x72/0xd0
> [   15.354845]  devl_params_register+0x150/0x250
> [   15.359456]  esw_offloads_init+0x181/0x1a0 [mlx5_core]
> [   15.364967]  mlx5_eswitch_init+0x4be/0x6e0 [mlx5_core]
> [   15.370471]  mlx5_init_once+0xf0/0x550 [mlx5_core]
> [   15.375601]  mlx5_init_one_devl_locked+0x7a/0x1d0 [mlx5_core]
> [   15.381676]  mlx5_init_one+0x2e/0x60 [mlx5_core]
> [   15.386616]  probe_one+0x2b6/0x410 [mlx5_core]
> [   15.391382]  local_pci_probe+0x45/0xa0
> [   15.395367]  work_for_cpu_fn+0x17/0x30
> [   15.399345]  process_scheduled_works+0x8a/0x380
> [   15.404102]  worker_thread+0x165/0x2d0
> [   15.408082]  ? __pfx_worker_thread+0x10/0x10
> [   15.412578]  kthread+0xf2/0x120
> [   15.415952]  ? __pfx_kthread+0x10/0x10
> [   15.419928]  ret_from_fork+0x31/0x40
> [   15.423724]  ? __pfx_kthread+0x10/0x10
> [   15.427692]  ret_from_fork_asm+0x1b/0x30
> [   15.431827]  </TASK>
> [   15.434218] Modules linked in: xor bnxt_re zstd_compress raid6_pq ib_uverbs sd_mod ib_core t10_pi mlx5_core(+) pci_hyperv_intf mlxfw ahci libahci bnx2x mpi3mr psample i40e libata tls bnxt_en megaraid_sas scsi_transport_sas crc32c_intel mgag200 mdio i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod
> [   15.461684] CR2: 0000000000001118
> [   15.465213] ---[ end trace 0000000000000000 ]---
> [   15.476059] pstore: backend (erst) writing error (-28)
> [   15.481415] RIP: 0010:esw_port_metadata_get+0x19/0x30 [mlx5_core]
> [   15.487856] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 d3 e8 ce 28 9a cc 48 8b 80 b0 09 00 00 <8b> 80 18 11 00 00 88 03 31 c0 80 23 01 5b c3 cc cc cc cc 0f 1f 40
> [   15.507043] RSP: 0000:ffff9ec05bf1fb98 EFLAGS: 00010286
> [   15.512493] RAX: 0000000000000000 RBX: ffff9ec05bf1fbe4 RCX: 0000000000000028
> [   15.519852] RDX: ffff9ec05bf1fbe4 RSI: 0000000000000013 RDI: ffff8bdd1d696000
> [   15.527209] RBP: ffffffffc1134c60 R08: 0000000000000000 R09: 0000000000000000
> [   15.534568] R10: ffff9ec05bf1fbf8 R11: 0000000000001000 R12: ffff8bdd1d696000
> [   15.541934] R13: ffff8bdd9541c720 R14: 0000000000000000 R15: 0000000000000000
> [   15.549299] FS:  0000000000000000(0000) GS:ffff8c3b7ea00000(0000) knlGS:0000000000000000
> [   15.557618] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   15.563607] CR2: 0000000000001118 CR3: 00000004f9220003 CR4: 00000000007706e0
> [   15.570981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   15.578356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   15.585733] PKRU: 55555554
> [   15.588679] Kernel panic - not syncing: Fatal exception
> [   15.594163] Kernel Offset: 0xbc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> 
> *2
>  From bf729988303a27833a86acb561f42b9a3cc12728 Mon Sep 17 00:00:00 2001
> From: Shay Drory <shayd@nvidia.com>
> Date: Thu, 11 Apr 2024 14:54:41 +0300
> Subject: [PATCH] net/mlx5: Restore mistakenly dropped parts in register
>   devlink flow
> 
> Fixes: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock")
> 
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2024/04/20
> 
> 
> 

  reply	other threads:[~2024-04-20 18:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17  9:34 Linux 6.6.28 Greg Kroah-Hartman
2024-04-17  9:34 ` Greg Kroah-Hartman
2024-04-20  5:59 ` Wang Yugui
2024-04-20 18:28   ` Shay Drori [this message]
2024-04-20 23:00     ` Wang Yugui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220e55df-a0a2-4272-b94f-c7c4c6fbf2b7@nvidia.com \
    --to=shayd@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwn@lwn.net \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox