* [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild
@ 2024-03-05 23:02 Jesse Brandeburg
2024-03-06 20:04 ` Simon Horman
2024-03-12 15:16 ` Pucha, HimasekharX Reddy
0 siblings, 2 replies; 4+ messages in thread
From: Jesse Brandeburg @ 2024-03-05 23:02 UTC (permalink / raw)
To: pmenzel, intel-wired-lan
Cc: Jesse Brandeburg, netdev, horms, Robert Elliott, Tony Nguyen,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Michal Swiatkowski, Przemek Kitszel
The ice driver would previously panic after suspend. This is caused
from the driver *only* calling the ice_vsi_free_q_vectors() function by
itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent
NULL pointer deref during reload") the driver has zeroed out
num_q_vectors, and only restored it in ice_vsi_cfg_def().
This further causes the ice_rebuild() function to allocate a zero length
buffer, after which num_q_vectors is updated, and then the new value of
num_q_vectors is used to index into the zero length buffer, which
corrupts memory.
The fix entails making sure all the code referencing num_q_vectors only
does so after it has been reset via ice_vsi_cfg_def().
I didn't perform a full bisect, but I was able to test against 6.1.77
kernel and that ice driver works fine for suspend/resume with no panic,
so sometime since then, this problem was introduced.
Also clean up an un-needed init of a local variable in the function
being modified.
PANIC from 6.8.0-rc1:
[1026674.915596] PM: suspend exit
[1026675.664697] ice 0000:17:00.1: PTP reset successful
[1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
[1026675.667660] ice 0000:b1:00.0: PTP reset successful
[1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
[1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
[1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
[1026677.192753] ice 0000:17:00.0: PTP reset successful
[1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
[1026677.197928] #PF: supervisor read access in kernel mode
[1026677.197933] #PF: error_code(0x0000) - not-present page
[1026677.197937] PGD 1557a7067 P4D 0
[1026677.212133] ice 0000:b1:00.1: PTP reset successful
[1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
[1026677.212575]
[1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1
[1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[1026677.269367] Workqueue: ice ice_service_task [ice]
[1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
[1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
[1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
[1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
[1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
[1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
[1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
[1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
[1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1026677.381952] PKRU: 55555554
[1026677.385116] Call Trace:
[1026677.388023] <TASK>
[1026677.390589] ? __die+0x20/0x70
[1026677.394105] ? page_fault_oops+0x82/0x160
[1026677.398576] ? do_user_addr_fault+0x65/0x6a0
[1026677.403307] ? exc_page_fault+0x6a/0x150
[1026677.407694] ? asm_exc_page_fault+0x22/0x30
[1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice]
[1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[1026677.429147] ice_rebuild+0x18b/0x520 [ice]
[1026677.433746] ? delay_tsc+0x8f/0xc0
[1026677.437630] ice_do_reset+0xa3/0x190 [ice]
[1026677.442231] ice_service_task+0x26/0x440 [ice]
[1026677.447180] process_one_work+0x174/0x340
[1026677.451669] worker_thread+0x27e/0x390
[1026677.455890] ? __pfx_worker_thread+0x10/0x10
[1026677.460627] kthread+0xee/0x120
[1026677.464235] ? __pfx_kthread+0x10/0x10
[1026677.468445] ret_from_fork+0x2d/0x50
[1026677.472476] ? __pfx_kthread+0x10/0x10
[1026677.476671] ret_from_fork_asm+0x1b/0x30
[1026677.481050] </TASK>
Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload")
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
v2: fix uninitialized coalesce pointer on the exit path by moving the
kfree to the later goto (simon), reword commit message (paul)
---
drivers/net/ethernet/intel/ice/ice_lib.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index fc23dbe302b4..cfc20684f25a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -3238,7 +3238,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
{
struct ice_vsi_cfg_params params = {};
struct ice_coalesce_stored *coalesce;
- int prev_num_q_vectors = 0;
+ int prev_num_q_vectors;
struct ice_pf *pf;
int ret;
@@ -3252,13 +3252,6 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
return -EINVAL;
- coalesce = kcalloc(vsi->num_q_vectors,
- sizeof(struct ice_coalesce_stored), GFP_KERNEL);
- if (!coalesce)
- return -ENOMEM;
-
- prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
-
ret = ice_vsi_realloc_stat_arrays(vsi);
if (ret)
goto err_vsi_cfg;
@@ -3268,6 +3261,13 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
if (ret)
goto err_vsi_cfg;
+ coalesce = kcalloc(vsi->num_q_vectors,
+ sizeof(struct ice_coalesce_stored), GFP_KERNEL);
+ if (!coalesce)
+ return -ENOMEM;
+
+ prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
+
ret = ice_vsi_cfg_tc_lan(pf, vsi);
if (ret) {
if (vsi_flags & ICE_VSI_FLAG_INIT) {
@@ -3286,8 +3286,8 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
err_vsi_cfg_tc_lan:
ice_vsi_decfg(vsi);
-err_vsi_cfg:
kfree(coalesce);
+err_vsi_cfg:
return ret;
}
base-commit: 4daa873133d3db4e17f4ccd9fe1102e4fbab7700
prerequisite-patch-id: 3108bcb752993e56fb77c40c95ff495617203de7
prerequisite-patch-id: e8d245c2c933aa302c2e1861e68d829b17bd1de1
prerequisite-patch-id: 8d0c4c9f22c20ccc76d0d7c075b13656aebc6876
prerequisite-patch-id: 39d1f161a2c9e46d8ee8b0a216bbe8f6587b8f0c
prerequisite-patch-id: 8dc559f5035db8e5eefc088168d5ff13ffaed849
prerequisite-patch-id: cf2908b038fe6db14d227dd1aa5b27d71316c7a8
prerequisite-patch-id: 54b81b89b6de45a5a93de4ab3f584865d61ecc64
prerequisite-patch-id: 2e25791729def6f00f8f2faee1e2e49926ab7a9c
prerequisite-patch-id: d6d5312b053b713666632facf0807540df5ea799
prerequisite-patch-id: f92dc6a5ff69a78568e3c272819a907ff2fa083b
prerequisite-patch-id: 6310ccdf59f5a447add2b6d0529b16b3f8bcdbc9
prerequisite-patch-id: 77af9426267ceb7f6a1ac04e7cb21868bf1f5460
prerequisite-patch-id: cd48ee2a7f6eb98827f3362d0c0e4f233e5b2e94
--
2.39.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild
2024-03-05 23:02 [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild Jesse Brandeburg
@ 2024-03-06 20:04 ` Simon Horman
2024-03-06 20:08 ` [Intel-wired-lan] " Loktionov, Aleksandr
2024-03-12 15:16 ` Pucha, HimasekharX Reddy
1 sibling, 1 reply; 4+ messages in thread
From: Simon Horman @ 2024-03-06 20:04 UTC (permalink / raw)
To: Jesse Brandeburg
Cc: pmenzel, intel-wired-lan, netdev, Robert Elliott, Tony Nguyen,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Michal Swiatkowski, Przemek Kitszel
On Tue, Mar 05, 2024 at 03:02:03PM -0800, Jesse Brandeburg wrote:
> The ice driver would previously panic after suspend. This is caused
> from the driver *only* calling the ice_vsi_free_q_vectors() function by
> itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent
> NULL pointer deref during reload") the driver has zeroed out
> num_q_vectors, and only restored it in ice_vsi_cfg_def().
>
> This further causes the ice_rebuild() function to allocate a zero length
> buffer, after which num_q_vectors is updated, and then the new value of
> num_q_vectors is used to index into the zero length buffer, which
> corrupts memory.
>
> The fix entails making sure all the code referencing num_q_vectors only
> does so after it has been reset via ice_vsi_cfg_def().
>
> I didn't perform a full bisect, but I was able to test against 6.1.77
> kernel and that ice driver works fine for suspend/resume with no panic,
> so sometime since then, this problem was introduced.
>
> Also clean up an un-needed init of a local variable in the function
> being modified.
>
> PANIC from 6.8.0-rc1:
>
> [1026674.915596] PM: suspend exit
> [1026675.664697] ice 0000:17:00.1: PTP reset successful
> [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
> [1026675.667660] ice 0000:b1:00.0: PTP reset successful
> [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
> [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
> [1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
> [1026677.192753] ice 0000:17:00.0: PTP reset successful
> [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
> [1026677.197928] #PF: supervisor read access in kernel mode
> [1026677.197933] #PF: error_code(0x0000) - not-present page
> [1026677.197937] PGD 1557a7067 P4D 0
> [1026677.212133] ice 0000:b1:00.1: PTP reset successful
> [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
> [1026677.212575]
> [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1
> [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
> [1026677.269367] Workqueue: ice ice_service_task [ice]
> [1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
> [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
> [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
> [1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
> [1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
> [1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
> [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
> [1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
> [1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
> [1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [1026677.381952] PKRU: 55555554
> [1026677.385116] Call Trace:
> [1026677.388023] <TASK>
> [1026677.390589] ? __die+0x20/0x70
> [1026677.394105] ? page_fault_oops+0x82/0x160
> [1026677.398576] ? do_user_addr_fault+0x65/0x6a0
> [1026677.403307] ? exc_page_fault+0x6a/0x150
> [1026677.407694] ? asm_exc_page_fault+0x22/0x30
> [1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
> [1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice]
> [1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
> [1026677.429147] ice_rebuild+0x18b/0x520 [ice]
> [1026677.433746] ? delay_tsc+0x8f/0xc0
> [1026677.437630] ice_do_reset+0xa3/0x190 [ice]
> [1026677.442231] ice_service_task+0x26/0x440 [ice]
> [1026677.447180] process_one_work+0x174/0x340
> [1026677.451669] worker_thread+0x27e/0x390
> [1026677.455890] ? __pfx_worker_thread+0x10/0x10
> [1026677.460627] kthread+0xee/0x120
> [1026677.464235] ? __pfx_kthread+0x10/0x10
> [1026677.468445] ret_from_fork+0x2d/0x50
> [1026677.472476] ? __pfx_kthread+0x10/0x10
> [1026677.476671] ret_from_fork_asm+0x1b/0x30
> [1026677.481050] </TASK>
>
> Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload")
> Reported-by: Robert Elliott <elliott@hpe.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> v2: fix uninitialized coalesce pointer on the exit path by moving the
> kfree to the later goto (simon), reword commit message (paul)
Thanks for the update.
Reviewed-by: Simon Horman <horms@kernel.org>
> ---
> drivers/net/ethernet/intel/ice/ice_lib.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index fc23dbe302b4..cfc20684f25a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -3238,7 +3238,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> {
> struct ice_vsi_cfg_params params = {};
> struct ice_coalesce_stored *coalesce;
> - int prev_num_q_vectors = 0;
> + int prev_num_q_vectors;
> struct ice_pf *pf;
> int ret;
>
> @@ -3252,13 +3252,6 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
> return -EINVAL;
>
> - coalesce = kcalloc(vsi->num_q_vectors,
> - sizeof(struct ice_coalesce_stored), GFP_KERNEL);
> - if (!coalesce)
> - return -ENOMEM;
> -
> - prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
> -
> ret = ice_vsi_realloc_stat_arrays(vsi);
> if (ret)
> goto err_vsi_cfg;
> @@ -3268,6 +3261,13 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> if (ret)
> goto err_vsi_cfg;
>
> + coalesce = kcalloc(vsi->num_q_vectors,
> + sizeof(struct ice_coalesce_stored), GFP_KERNEL);
> + if (!coalesce)
> + return -ENOMEM;
> +
> + prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
> +
> ret = ice_vsi_cfg_tc_lan(pf, vsi);
> if (ret) {
> if (vsi_flags & ICE_VSI_FLAG_INIT) {
> @@ -3286,8 +3286,8 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
>
> err_vsi_cfg_tc_lan:
> ice_vsi_decfg(vsi);
> -err_vsi_cfg:
> kfree(coalesce);
> +err_vsi_cfg:
FWIIW, I might have dropped the err_vsi_cfg label all together
and simply returned at points that previously used it.
But that would not be functionally different to what you have done.
> return ret;
> }
...
^ permalink raw reply [flat|nested] 4+ messages in thread* RE: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild
2024-03-06 20:04 ` Simon Horman
@ 2024-03-06 20:08 ` Loktionov, Aleksandr
0 siblings, 0 replies; 4+ messages in thread
From: Loktionov, Aleksandr @ 2024-03-06 20:08 UTC (permalink / raw)
To: Simon Horman, Brandeburg, Jesse
Cc: pmenzel@molgen.mpg.de, Michal Swiatkowski, netdev@vger.kernel.org,
Kitszel, Przemyslaw, Eric Dumazet, Nguyen, Anthony L,
Jakub Kicinski, intel-wired-lan@lists.osuosl.org, Paolo Abeni,
David S. Miller, Elliott, Rob
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On
> Behalf Of Simon Horman
> Sent: Wednesday, March 6, 2024 9:04 PM
> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>
> Cc: pmenzel@molgen.mpg.de; Michal Swiatkowski
> <michal.swiatkowski@linux.intel.com>; netdev@vger.kernel.org;
> Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Eric Dumazet
> <edumazet@google.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Jakub Kicinski <kuba@kernel.org>;
> intel-wired-lan@lists.osuosl.org; Paolo Abeni <pabeni@redhat.com>;
> David S. Miller <davem@davemloft.net>; Elliott, Rob
> <elliott@hpe.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix memory
> corruption bug with suspend and rebuild
>
> On Tue, Mar 05, 2024 at 03:02:03PM -0800, Jesse Brandeburg wrote:
> > The ice driver would previously panic after suspend. This is
> caused
> > from the driver *only* calling the ice_vsi_free_q_vectors()
> function
> > by itself, when it is suspending. Since commit b3e7b3a6ee92
> ("ice:
> > prevent NULL pointer deref during reload") the driver has zeroed
> out
> > num_q_vectors, and only restored it in ice_vsi_cfg_def().
> >
> > This further causes the ice_rebuild() function to allocate a zero
> > length buffer, after which num_q_vectors is updated, and then the
> new
> > value of num_q_vectors is used to index into the zero length
> buffer,
> > which corrupts memory.
> >
> > The fix entails making sure all the code referencing
> num_q_vectors
> > only does so after it has been reset via ice_vsi_cfg_def().
> >
> > I didn't perform a full bisect, but I was able to test against
> 6.1.77
> > kernel and that ice driver works fine for suspend/resume with no
> > panic, so sometime since then, this problem was introduced.
> >
> > Also clean up an un-needed init of a local variable in the
> function
> > being modified.
> >
> > PANIC from 6.8.0-rc1:
> >
> > [1026674.915596] PM: suspend exit
> > [1026675.664697] ice 0000:17:00.1: PTP reset successful
> > [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between
> update to
> > cached PHC time [1026675.667660] ice 0000:b1:00.0: PTP reset
> > successful [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed
> > between update to cached PHC time [1026677.137733] ixgbe
> 0000:31:00.0
> > ens787: NIC Link is Up 1 Gbps, Flow Control: None
> [1026677.190201]
> > BUG: kernel NULL pointer dereference, address: 0000000000000010
> > [1026677.192753] ice 0000:17:00.0: PTP reset successful
> > [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between
> update to
> > cached PHC time [1026677.197928] #PF: supervisor read access in
> kernel
> > mode [1026677.197933] #PF: error_code(0x0000) - not-present page
> > [1026677.197937] PGD 1557a7067 P4D 0 [1026677.212133] ice
> > 0000:b1:00.1: PTP reset successful [1026677.212143] ice
> 0000:b1:00.1:
> > 4344 msecs passed between update to cached PHC time
> [1026677.212575]
> > [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump:
> loaded Tainted: G W 6.8.0-rc1+ #1
> > [1026677.257989] Hardware name: Intel Corporation
> > M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810
> > 02/16/2022 [1026677.269367] Workqueue: ice ice_service_task [ice]
> > [1026677.274592] RIP:
> 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0
> > [ice] [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02
> 66 89
> > b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48
> 8b
> > 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7
> aa 02
> > 00 00 81 e6 [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS:
> > 00010202 [1026677.306556] RAX: ff28691e28980828 RBX:
> ff28691e41099828
> > RCX: 0000000000188000 [1026677.314148] RDX: 0000000000000000 RSI:
> > 0000000000000010 RDI: ff28691e41099828 [1026677.321730] RBP:
> > 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12:
> > 0000000000000010 [1026677.336896] R13: 0000000000000000 R14:
> > 0000000000000000 R15: ff28691e0eaa81a0 [1026677.344472] FS:
> > 0000000000000000(0000) GS:ff28693cbffc0000(0000)
> > knlGS:0000000000000000 [1026677.353000] CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033 [1026677.359195] CR2: 0000000000000010
> CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] DR0:
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116]
> Call Trace:
> > [1026677.388023] <TASK>
> > [1026677.390589] ? __die+0x20/0x70
> > [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576]
> ?
> > do_user_addr_fault+0x65/0x6a0 [1026677.403307] ?
> > exc_page_fault+0x6a/0x150 [1026677.407694] ?
> > asm_exc_page_fault+0x22/0x30 [1026677.412349] ?
> > ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614]
> > ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583]
> > ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147]
> > ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ?
> delay_tsc+0x8f/0xc0
> > [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231]
> > ice_service_task+0x26/0x440 [ice] [1026677.447180]
> > process_one_work+0x174/0x340 [1026677.451669]
> > worker_thread+0x27e/0x390 [1026677.455890] ?
> > __pfx_worker_thread+0x10/0x10 [1026677.460627]
> kthread+0xee/0x120
> > [1026677.464235] ? __pfx_kthread+0x10/0x10 [1026677.468445]
> > ret_from_fork+0x2d/0x50 [1026677.472476] ?
> __pfx_kthread+0x10/0x10
> > [1026677.476671] ret_from_fork_asm+0x1b/0x30 [1026677.481050]
> > </TASK>
> >
> > Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during
> reload")
> > Reported-by: Robert Elliott <elliott@hpe.com>
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > ---
> > v2: fix uninitialized coalesce pointer on the exit path by moving
> the
> > kfree to the later goto (simon), reword commit message (paul)
>
> Thanks for the update.
>
> Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>
> > ---
> > drivers/net/ethernet/intel/ice/ice_lib.c | 18 +++++++++---------
> > 1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c
> > b/drivers/net/ethernet/intel/ice/ice_lib.c
> > index fc23dbe302b4..cfc20684f25a 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > @@ -3238,7 +3238,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi,
> u32
> > vsi_flags) {
> > struct ice_vsi_cfg_params params = {};
> > struct ice_coalesce_stored *coalesce;
> > - int prev_num_q_vectors = 0;
> > + int prev_num_q_vectors;
> > struct ice_pf *pf;
> > int ret;
> >
> > @@ -3252,13 +3252,6 @@ int ice_vsi_rebuild(struct ice_vsi *vsi,
> u32 vsi_flags)
> > if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
> > return -EINVAL;
> >
> > - coalesce = kcalloc(vsi->num_q_vectors,
> > - sizeof(struct ice_coalesce_stored),
> GFP_KERNEL);
> > - if (!coalesce)
> > - return -ENOMEM;
> > -
> > - prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi,
> coalesce);
> > -
> > ret = ice_vsi_realloc_stat_arrays(vsi);
> > if (ret)
> > goto err_vsi_cfg;
> > @@ -3268,6 +3261,13 @@ int ice_vsi_rebuild(struct ice_vsi *vsi,
> u32 vsi_flags)
> > if (ret)
> > goto err_vsi_cfg;
> >
> > + coalesce = kcalloc(vsi->num_q_vectors,
> > + sizeof(struct ice_coalesce_stored),
> GFP_KERNEL);
> > + if (!coalesce)
> > + return -ENOMEM;
> > +
> > + prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi,
> coalesce);
> > +
> > ret = ice_vsi_cfg_tc_lan(pf, vsi);
> > if (ret) {
> > if (vsi_flags & ICE_VSI_FLAG_INIT) { @@ -3286,8 +3286,8
> @@ int
> > ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> >
> > err_vsi_cfg_tc_lan:
> > ice_vsi_decfg(vsi);
> > -err_vsi_cfg:
> > kfree(coalesce);
> > +err_vsi_cfg:
>
> FWIIW, I might have dropped the err_vsi_cfg label all together and
> simply returned at points that previously used it.
>
> But that would not be functionally different to what you have done.
>
> > return ret;
> > }
>
> ...
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild
2024-03-05 23:02 [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild Jesse Brandeburg
2024-03-06 20:04 ` Simon Horman
@ 2024-03-12 15:16 ` Pucha, HimasekharX Reddy
1 sibling, 0 replies; 4+ messages in thread
From: Pucha, HimasekharX Reddy @ 2024-03-12 15:16 UTC (permalink / raw)
To: Brandeburg, Jesse, pmenzel@molgen.mpg.de,
intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Kitszel, Przemyslaw, Eric Dumazet,
Nguyen, Anthony L, horms@kernel.org, Michal Swiatkowski,
Jakub Kicinski, Paolo Abeni, David S. Miller, Elliott, Rob
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Jesse Brandeburg
> Sent: Wednesday, March 6, 2024 4:32 AM
> To: pmenzel@molgen.mpg.de; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Eric Dumazet <edumazet@google.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; horms@kernel.org; Michal Swiatkowski <michal.swiatkowski@linux.intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; Elliott, Rob <elliott@hpe.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild
>
> The ice driver would previously panic after suspend. This is caused from the driver *only* calling the ice_vsi_free_q_vectors() function by itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") the driver has zeroed out num_q_vectors, and only restored it in ice_vsi_cfg_def().
>
> This further causes the ice_rebuild() function to allocate a zero length buffer, after which num_q_vectors is updated, and then the new value of num_q_vectors is used to index into the zero length buffer, which corrupts memory.
>
> The fix entails making sure all the code referencing num_q_vectors only does so after it has been reset via ice_vsi_cfg_def().
>
> I didn't perform a full bisect, but I was able to test against 6.1.77 kernel and that ice driver works fine for suspend/resume with no panic, so sometime since then, this problem was introduced.
>
> Also clean up an un-needed init of a local variable in the function being modified.
>
> PANIC from 6.8.0-rc1:
>
> [1026674.915596] PM: suspend exit
> [1026675.664697] ice 0000:17:00.1: PTP reset successful [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time [1026675.667660] ice 0000:b1:00.0: PTP reset successful [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None [1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010 [1026677.192753] ice 0000:17:00.0: PTP reset successful [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time [1026677.197928] #PF: supervisor read access in kernel mode [1026677.197933] #PF: error_code(0x0000) - not-present page [1026677.197937] PGD 1557a7067 P4D 0 [1026677.212133] ice 0000:b1:00.1: PTP reset successful [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time [1026677.212575] > [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1
> [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [1026677.269367] Workqueue: ice ice_service_task [ice] [1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6 [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202 [1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000 [1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828 [1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010 [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0 [1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000 [1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116] Call Trace:
> [1026677.388023] <TASK>
> [1026677.390589] ? __die+0x20/0x70
> [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576] ? do_user_addr_fault+0x65/0x6a0 [1026677.403307] ? exc_page_fault+0x6a/0x150 [1026677.407694] ? asm_exc_page_fault+0x22/0x30 [1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147] ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ? delay_tsc+0x8f/0xc0 [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231] ice_service_task+0x26/0x440 [ice] [1026677.447180] process_one_work+0x174/0x340 [1026677.451669] worker_thread+0x27e/0x390 [1026677.455890] ? __pfx_worker_thread+0x10/0x10 [1026677.460627] kthread+0xee/0x120 [1026677.464235] ? __pfx_kthread+0x10/0x10 [1026677.468445] ret_from_fork+0x2d/0x50 [1026677.472476] ? __pfx_kthread+0x10/0x10 [1026677.476671] ret_from_fork_asm+0x1b/0x30 [1026677.481050] </TASK>
>
> Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload")
> Reported-by: Robert Elliott <elliott@hpe.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> v2: fix uninitialized coalesce pointer on the exit path by moving the kfree to the later goto (simon), reword commit message (paul)
> ---
> drivers/net/ethernet/intel/ice/ice_lib.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-03-12 15:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-05 23:02 [PATCH iwl-net v2] ice: fix memory corruption bug with suspend and rebuild Jesse Brandeburg
2024-03-06 20:04 ` Simon Horman
2024-03-06 20:08 ` [Intel-wired-lan] " Loktionov, Aleksandr
2024-03-12 15:16 ` Pucha, HimasekharX Reddy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).