public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] net/mlx5e: Fix a race in command alloc flow
@ 2023-11-21 11:52 Shifeng Li
  2023-11-22 12:02 ` Leon Romanovsky
  2023-11-26 16:13 ` Moshe Shemesh
  0 siblings, 2 replies; 6+ messages in thread
From: Shifeng Li @ 2023-11-21 11:52 UTC (permalink / raw)
  To: saeedm, leon, davem, edumazet, kuba, pabeni, jackm, ogerlitz,
	roland, eli
  Cc: dinghui, netdev, linux-rdma, linux-kernel, Shifeng Li

Fix a cmd->ent use after free due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process allocated a ent->idx but didn't set ent to cmd->ent_arr in
cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
the spin lock.

[70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
[70013.081968]
[70013.081989] CPU: 26 PID: 1433361 Comm: kworker/26:1 Kdump: loaded Tainted: G           OE     4.19.90-25.17.v2101.osc.sfc.6.10.0.0030.ky10.x86_64+debug #1
[70013.082001] Hardware name: SANGFOR 65N32-US/ASERVER-G-2605, BIOS SSSS5203 08/19/2020
[70013.082028] Workqueue: events aer_isr
[70013.082053] Call Trace:
[70013.082067]  dump_stack+0x8b/0xbb
[70013.082086]  print_address_description+0x6a/0x270
[70013.082102]  kasan_report+0x179/0x2c0
[70013.082133]  ? mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.082213]  ? mlx5_cmd_use_polling+0x20/0x20 [mlx5_core]
[70013.082223]  ? kmem_cache_free+0x1ad/0x1e0
[70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
[70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
[70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
[70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
[70013.082390]  ? __pm_runtime_resume+0x58/0x70
[70013.082409]  pci_device_remove+0xf3/0x280
[70013.082426]  ? pcibios_free_irq+0x10/0x10
[70013.082439]  device_release_driver_internal+0x1c3/0x470
[70013.082453]  pci_stop_bus_device+0x109/0x160
[70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
[70013.082485]  pcie_do_fatal_recovery+0x167/0x550
[70013.082493]  aer_isr+0x7d2/0x960
[70013.082510]  ? aer_get_device_error_info+0x420/0x420
[70013.082526]  ? __schedule+0x821/0x2040
[70013.082536]  ? strscpy+0x85/0x180
[70013.082543]  process_one_work+0x65f/0x12d0
[70013.082556]  worker_thread+0x87/0xb50
[70013.082563]  ? __kthread_parkme+0x82/0xf0
[70013.082569]  ? process_one_work+0x12d0/0x12d0
[70013.082571]  kthread+0x2e9/0x3a0
[70013.082579]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[70013.082592]  ret_from_fork+0x1f/0x40

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Shifeng Li <lishifeng1992@126.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
---
v1->v2: fix code conflicts.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index f8f0a712c943..a7b1f9686c09 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -156,15 +156,18 @@ static u8 alloc_token(struct mlx5_cmd *cmd)
 	return token;
 }
 
-static int cmd_alloc_index(struct mlx5_cmd *cmd)
+static int cmd_alloc_index(struct mlx5_cmd *cmd, struct mlx5_cmd_work_ent *ent)
 {
 	unsigned long flags;
 	int ret;
 
 	spin_lock_irqsave(&cmd->alloc_lock, flags);
 	ret = find_first_bit(&cmd->vars.bitmask, cmd->vars.max_reg_cmds);
-	if (ret < cmd->vars.max_reg_cmds)
+	if (ret < cmd->vars.max_reg_cmds) {
 		clear_bit(ret, &cmd->vars.bitmask);
+		ent->idx = ret;
+		cmd->ent_arr[ent->idx] = ent;
+	}
 	spin_unlock_irqrestore(&cmd->alloc_lock, flags);
 
 	return ret < cmd->vars.max_reg_cmds ? ret : -ENOMEM;
@@ -979,7 +982,7 @@ static void cmd_work_handler(struct work_struct *work)
 	sem = ent->page_queue ? &cmd->vars.pages_sem : &cmd->vars.sem;
 	down(sem);
 	if (!ent->page_queue) {
-		alloc_ret = cmd_alloc_index(cmd);
+		alloc_ret = cmd_alloc_index(cmd, ent);
 		if (alloc_ret < 0) {
 			mlx5_core_err_rl(dev, "failed to allocate command entry\n");
 			if (ent->callback) {
@@ -994,15 +997,14 @@ static void cmd_work_handler(struct work_struct *work)
 			up(sem);
 			return;
 		}
-		ent->idx = alloc_ret;
 	} else {
 		ent->idx = cmd->vars.max_reg_cmds;
 		spin_lock_irqsave(&cmd->alloc_lock, flags);
 		clear_bit(ent->idx, &cmd->vars.bitmask);
+		cmd->ent_arr[ent->idx] = ent;
 		spin_unlock_irqrestore(&cmd->alloc_lock, flags);
 	}
 
-	cmd->ent_arr[ent->idx] = ent;
 	lay = get_inst(cmd, ent->idx);
 	ent->lay = lay;
 	memset(lay, 0, sizeof(*lay));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net/mlx5e: Fix a race in command alloc flow
  2023-11-21 11:52 [PATCH v2] net/mlx5e: Fix a race in command alloc flow Shifeng Li
@ 2023-11-22 12:02 ` Leon Romanovsky
  2023-11-23  2:20   ` Shifeng Li
  2023-11-23 12:45   ` Shifeng Li
  2023-11-26 16:13 ` Moshe Shemesh
  1 sibling, 2 replies; 6+ messages in thread
From: Leon Romanovsky @ 2023-11-22 12:02 UTC (permalink / raw)
  To: Shifeng Li
  Cc: saeedm, davem, edumazet, kuba, pabeni, jackm, ogerlitz, roland,
	eli, dinghui, netdev, linux-rdma, linux-kernel

On Tue, Nov 21, 2023 at 03:52:51AM -0800, Shifeng Li wrote:
> Fix a cmd->ent use after free due to a race on command entry.
> Such race occurs when one of the commands releases its last refcount and
> frees its index and entry while another process running command flush
> flow takes refcount to this command entry. The process which handles
> commands flush may see this command as needed to be flushed if the other
> process allocated a ent->idx but didn't set ent to cmd->ent_arr in
> cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
> the spin lock.
> 
> [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
> [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
> [70013.081968]
> [70013.081989] CPU: 26 PID: 1433361 Comm: kworker/26:1 Kdump: loaded Tainted: G           OE     4.19.90-25.17.v2101.osc.sfc.6.10.0.0030.ky10.x86_64+debug #1
> [70013.082001] Hardware name: SANGFOR 65N32-US/ASERVER-G-2605, BIOS SSSS5203 08/19/2020
> [70013.082028] Workqueue: events aer_isr
> [70013.082053] Call Trace:
> [70013.082067]  dump_stack+0x8b/0xbb
> [70013.082086]  print_address_description+0x6a/0x270
> [70013.082102]  kasan_report+0x179/0x2c0
> [70013.082133]  ? mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
> [70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
> [70013.082213]  ? mlx5_cmd_use_polling+0x20/0x20 [mlx5_core]
> [70013.082223]  ? kmem_cache_free+0x1ad/0x1e0
> [70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
> [70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
> [70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
> [70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
> [70013.082390]  ? __pm_runtime_resume+0x58/0x70
> [70013.082409]  pci_device_remove+0xf3/0x280
> [70013.082426]  ? pcibios_free_irq+0x10/0x10
> [70013.082439]  device_release_driver_internal+0x1c3/0x470
> [70013.082453]  pci_stop_bus_device+0x109/0x160
> [70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
> [70013.082485]  pcie_do_fatal_recovery+0x167/0x550
> [70013.082493]  aer_isr+0x7d2/0x960
> [70013.082510]  ? aer_get_device_error_info+0x420/0x420
> [70013.082526]  ? __schedule+0x821/0x2040
> [70013.082536]  ? strscpy+0x85/0x180
> [70013.082543]  process_one_work+0x65f/0x12d0
> [70013.082556]  worker_thread+0x87/0xb50
> [70013.082563]  ? __kthread_parkme+0x82/0xf0
> [70013.082569]  ? process_one_work+0x12d0/0x12d0
> [70013.082571]  kthread+0x2e9/0x3a0
> [70013.082579]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> [70013.082592]  ret_from_fork+0x1f/0x40

I'm curious how did you get this error? I would expect to see some sort
of lock in upper level which prevents it.

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net/mlx5e: Fix a race in command alloc flow
  2023-11-22 12:02 ` Leon Romanovsky
@ 2023-11-23  2:20   ` Shifeng Li
  2023-11-23 12:45   ` Shifeng Li
  1 sibling, 0 replies; 6+ messages in thread
From: Shifeng Li @ 2023-11-23  2:20 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: saeedm, davem, edumazet, kuba, pabeni, jackm, ogerlitz, roland,
	eli, dinghui, netdev, linux-rdma, linux-kernel

On 2023/11/22 20:02, Leon Romanovsky wrote:
> On Tue, Nov 21, 2023 at 03:52:51AM -0800, Shifeng Li wrote:
>> Fix a cmd->ent use after free due to a race on command entry.
>> Such race occurs when one of the commands releases its last refcount and
>> frees its index and entry while another process running command flush
>> flow takes refcount to this command entry. The process which handles
>> commands flush may see this command as needed to be flushed if the other
>> process allocated a ent->idx but didn't set ent to cmd->ent_arr in
>> cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
>> the spin lock.
>>
>> [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
>> [70013.081968]
>> [70013.081989] CPU: 26 PID: 1433361 Comm: kworker/26:1 Kdump: loaded Tainted: G           OE     4.19.90-25.17.v2101.osc.sfc.6.10.0.0030.ky10.x86_64+debug #1
>> [70013.082001] Hardware name: SANGFOR 65N32-US/ASERVER-G-2605, BIOS SSSS5203 08/19/2020
>> [70013.082028] Workqueue: events aer_isr
>> [70013.082053] Call Trace:
>> [70013.082067]  dump_stack+0x8b/0xbb
>> [70013.082086]  print_address_description+0x6a/0x270
>> [70013.082102]  kasan_report+0x179/0x2c0
>> [70013.082133]  ? mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.082213]  ? mlx5_cmd_use_polling+0x20/0x20 [mlx5_core]
>> [70013.082223]  ? kmem_cache_free+0x1ad/0x1e0
>> [70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
>> [70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
>> [70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
>> [70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
>> [70013.082390]  ? __pm_runtime_resume+0x58/0x70
>> [70013.082409]  pci_device_remove+0xf3/0x280
>> [70013.082426]  ? pcibios_free_irq+0x10/0x10
>> [70013.082439]  device_release_driver_internal+0x1c3/0x470
>> [70013.082453]  pci_stop_bus_device+0x109/0x160
>> [70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
>> [70013.082485]  pcie_do_fatal_recovery+0x167/0x550
>> [70013.082493]  aer_isr+0x7d2/0x960
>> [70013.082510]  ? aer_get_device_error_info+0x420/0x420
>> [70013.082526]  ? __schedule+0x821/0x2040
>> [70013.082536]  ? strscpy+0x85/0x180
>> [70013.082543]  process_one_work+0x65f/0x12d0
>> [70013.082556]  worker_thread+0x87/0xb50
>> [70013.082563]  ? __kthread_parkme+0x82/0xf0
>> [70013.082569]  ? process_one_work+0x12d0/0x12d0
>> [70013.082571]  kthread+0x2e9/0x3a0
>> [70013.082579]  ? kthread_create_worker_on_cpu+0xc0/0xc0
>> [70013.082592]  ret_from_fork+0x1f/0x40
> 
> I'm curious how did you get this error? I would expect to see some sort
> of lock in upper level which prevents it.
> 
Just inject AER unrecoverable error to pci BDF device corresponding to 
the network card constantly and I get this error.

Thanks
> Thanks


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net/mlx5e: Fix a race in command alloc flow
  2023-11-22 12:02 ` Leon Romanovsky
  2023-11-23  2:20   ` Shifeng Li
@ 2023-11-23 12:45   ` Shifeng Li
  1 sibling, 0 replies; 6+ messages in thread
From: Shifeng Li @ 2023-11-23 12:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: saeedm, davem, edumazet, kuba, pabeni, jackm, ogerlitz, roland,
	eli, dinghui, netdev, linux-rdma, linux-kernel

On 2023/11/22 20:02, Leon Romanovsky wrote:
> On Tue, Nov 21, 2023 at 03:52:51AM -0800, Shifeng Li wrote:
>> Fix a cmd->ent use after free due to a race on command entry.
>> Such race occurs when one of the commands releases its last refcount and
>> frees its index and entry while another process running command flush
>> flow takes refcount to this command entry. The process which handles
>> commands flush may see this command as needed to be flushed if the other
>> process allocated a ent->idx but didn't set ent to cmd->ent_arr in
>> cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
>> the spin lock.
>>
>> [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
>> [70013.081968]
>> [70013.081989] CPU: 26 PID: 1433361 Comm: kworker/26:1 Kdump: loaded Tainted: G           OE     4.19.90-25.17.v2101.osc.sfc.6.10.0.0030.ky10.x86_64+debug #1
>> [70013.082001] Hardware name: SANGFOR 65N32-US/ASERVER-G-2605, BIOS SSSS5203 08/19/2020
>> [70013.082028] Workqueue: events aer_isr
>> [70013.082053] Call Trace:
>> [70013.082067]  dump_stack+0x8b/0xbb
>> [70013.082086]  print_address_description+0x6a/0x270
>> [70013.082102]  kasan_report+0x179/0x2c0
>> [70013.082133]  ? mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.082213]  ? mlx5_cmd_use_polling+0x20/0x20 [mlx5_core]
>> [70013.082223]  ? kmem_cache_free+0x1ad/0x1e0
>> [70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
>> [70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
>> [70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
>> [70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
>> [70013.082390]  ? __pm_runtime_resume+0x58/0x70
>> [70013.082409]  pci_device_remove+0xf3/0x280
>> [70013.082426]  ? pcibios_free_irq+0x10/0x10
>> [70013.082439]  device_release_driver_internal+0x1c3/0x470
>> [70013.082453]  pci_stop_bus_device+0x109/0x160
>> [70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
>> [70013.082485]  pcie_do_fatal_recovery+0x167/0x550
>> [70013.082493]  aer_isr+0x7d2/0x960
>> [70013.082510]  ? aer_get_device_error_info+0x420/0x420
>> [70013.082526]  ? __schedule+0x821/0x2040
>> [70013.082536]  ? strscpy+0x85/0x180
>> [70013.082543]  process_one_work+0x65f/0x12d0
>> [70013.082556]  worker_thread+0x87/0xb50
>> [70013.082563]  ? __kthread_parkme+0x82/0xf0
>> [70013.082569]  ? process_one_work+0x12d0/0x12d0
>> [70013.082571]  kthread+0x2e9/0x3a0
>> [70013.082579]  ? kthread_create_worker_on_cpu+0xc0/0xc0
>> [70013.082592]  ret_from_fork+0x1f/0x40
> 
> I'm curious how did you get this error? I would expect to see some sort
> of lock in upper level which prevents it.
> 
The logical relationship of this error is as follows:

                   aer_recover_work                    |             ent->work
------------------------------------------------------+---------------------------------
aer_recover_work_func                                 |
|- pcie_do_recovery                                   |
   |- report_error_detected                            |
     |- mlx5_pci_err_detected                          |cmd_work_handler
       |- mlx5_enter_error_state                       |  |- cmd_alloc_index
         |- enter_error_state                          |    |- lock cmd->alloc_lock
           |- mlx5_cmd_flush                           |    |- clear_bit
             |- mlx5_cmd_trigger_completions           |    |- unlock cmd->alloc_lock
               |- lock cmd->alloc_lock                 |
               |- vector = ~dev->cmd.vars.bitmask      |
               |- for_each_set_bit                     |
                 |- cmd_ent_get(cmd->ent_arr[i]) (UAF) |
               |- unlock cmd->alloc_lock               |  |- cmd->ent_arr[ent->idx] = ent

The cmd->ent_arr[ent->idx] assignment and the bit clearing are not protected by the cmd->alloc_lock in cmd_work_handler().

> Thanks


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net/mlx5e: Fix a race in command alloc flow
  2023-11-21 11:52 [PATCH v2] net/mlx5e: Fix a race in command alloc flow Shifeng Li
  2023-11-22 12:02 ` Leon Romanovsky
@ 2023-11-26 16:13 ` Moshe Shemesh
  2023-11-30  3:10   ` Shifeng Li
  1 sibling, 1 reply; 6+ messages in thread
From: Moshe Shemesh @ 2023-11-26 16:13 UTC (permalink / raw)
  To: Shifeng Li, saeedm, leon, davem, edumazet, kuba, pabeni, jackm,
	ogerlitz, roland, eli
  Cc: dinghui, netdev, linux-rdma, linux-kernel



On 11/21/2023 1:52 PM, Shifeng Li wrote:
> Fix a cmd->ent use after free due to a race on command entry.
> Such race occurs when one of the commands releases its last refcount and
> frees its index and entry while another process running command flush
> flow takes refcount to this command entry. The process which handles
> commands flush may see this command as needed to be flushed if the other
> process allocated a ent->idx but didn't set ent to cmd->ent_arr in
> cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
> the spin lock.
> 
> [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
> [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
> [70013.081968]
> [70013.081989] CPU: 26 PID: 1433361 Comm: kworker/26:1 Kdump: loaded Tainted: G           OE     4.19.90-25.17.v2101.osc.sfc.6.10.0.0030.ky10.x86_64+debug #1
> [70013.082001] Hardware name: SANGFOR 65N32-US/ASERVER-G-2605, BIOS SSSS5203 08/19/2020
> [70013.082028] Workqueue: events aer_isr
> [70013.082053] Call Trace:
> [70013.082067]  dump_stack+0x8b/0xbb
> [70013.082086]  print_address_description+0x6a/0x270
> [70013.082102]  kasan_report+0x179/0x2c0
> [70013.082133]  ? mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
> [70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
> [70013.082213]  ? mlx5_cmd_use_polling+0x20/0x20 [mlx5_core]
> [70013.082223]  ? kmem_cache_free+0x1ad/0x1e0
> [70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
> [70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
> [70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
> [70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
> [70013.082390]  ? __pm_runtime_resume+0x58/0x70
> [70013.082409]  pci_device_remove+0xf3/0x280
> [70013.082426]  ? pcibios_free_irq+0x10/0x10
> [70013.082439]  device_release_driver_internal+0x1c3/0x470
> [70013.082453]  pci_stop_bus_device+0x109/0x160
> [70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
> [70013.082485]  pcie_do_fatal_recovery+0x167/0x550
> [70013.082493]  aer_isr+0x7d2/0x960
> [70013.082510]  ? aer_get_device_error_info+0x420/0x420
> [70013.082526]  ? __schedule+0x821/0x2040
> [70013.082536]  ? strscpy+0x85/0x180
> [70013.082543]  process_one_work+0x65f/0x12d0
> [70013.082556]  worker_thread+0x87/0xb50
> [70013.082563]  ? __kthread_parkme+0x82/0xf0
> [70013.082569]  ? process_one_work+0x12d0/0x12d0
> [70013.082571]  kthread+0x2e9/0x3a0
> [70013.082579]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> [70013.082592]  ret_from_fork+0x1f/0x40
> 
> Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
> Signed-off-by: Shifeng Li<lishifeng1992@126.com>

Fixes tag should be :
Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry 
while timeout comp handler")

Reviewed-by: Moshe Shemesh <moshe@nvidia.com>

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net/mlx5e: Fix a race in command alloc flow
  2023-11-26 16:13 ` Moshe Shemesh
@ 2023-11-30  3:10   ` Shifeng Li
  0 siblings, 0 replies; 6+ messages in thread
From: Shifeng Li @ 2023-11-30  3:10 UTC (permalink / raw)
  To: Moshe Shemesh, saeedm, leon, davem, edumazet, kuba, pabeni, jackm,
	ogerlitz, roland, eli
  Cc: dinghui, netdev, linux-rdma, linux-kernel

On 2023/11/27 0:13, Moshe Shemesh wrote:
> 
> 
> On 11/21/2023 1:52 PM, Shifeng Li wrote:
>> Fix a cmd->ent use after free due to a race on command entry.
>> Such race occurs when one of the commands releases its last refcount and
>> frees its index and entry while another process running command flush
>> flow takes refcount to this command entry. The process which handles
>> commands flush may see this command as needed to be flushed if the other
>> process allocated a ent->idx but didn't set ent to cmd->ent_arr in
>> cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
>> the spin lock.
>>
>> [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
>> [70013.081968]
>> [70013.081989] CPU: 26 PID: 1433361 Comm: kworker/26:1 Kdump: loaded Tainted: G           OE     4.19.90-25.17.v2101.osc.sfc.6.10.0.0030.ky10.x86_64+debug #1
>> [70013.082001] Hardware name: SANGFOR 65N32-US/ASERVER-G-2605, BIOS SSSS5203 08/19/2020
>> [70013.082028] Workqueue: events aer_isr
>> [70013.082053] Call Trace:
>> [70013.082067]  dump_stack+0x8b/0xbb
>> [70013.082086]  print_address_description+0x6a/0x270
>> [70013.082102]  kasan_report+0x179/0x2c0
>> [70013.082133]  ? mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>> [70013.082213]  ? mlx5_cmd_use_polling+0x20/0x20 [mlx5_core]
>> [70013.082223]  ? kmem_cache_free+0x1ad/0x1e0
>> [70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
>> [70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
>> [70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
>> [70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
>> [70013.082390]  ? __pm_runtime_resume+0x58/0x70
>> [70013.082409]  pci_device_remove+0xf3/0x280
>> [70013.082426]  ? pcibios_free_irq+0x10/0x10
>> [70013.082439]  device_release_driver_internal+0x1c3/0x470
>> [70013.082453]  pci_stop_bus_device+0x109/0x160
>> [70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
>> [70013.082485]  pcie_do_fatal_recovery+0x167/0x550
>> [70013.082493]  aer_isr+0x7d2/0x960
>> [70013.082510]  ? aer_get_device_error_info+0x420/0x420
>> [70013.082526]  ? __schedule+0x821/0x2040
>> [70013.082536]  ? strscpy+0x85/0x180
>> [70013.082543]  process_one_work+0x65f/0x12d0
>> [70013.082556]  worker_thread+0x87/0xb50
>> [70013.082563]  ? __kthread_parkme+0x82/0xf0
>> [70013.082569]  ? process_one_work+0x12d0/0x12d0
>> [70013.082571]  kthread+0x2e9/0x3a0
>> [70013.082579]  ? kthread_create_worker_on_cpu+0xc0/0xc0
>> [70013.082592]  ret_from_fork+0x1f/0x40
>>
>> Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
>> Signed-off-by: Shifeng Li<lishifeng1992@126.com>
> 
> Fixes tag should be :
> Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
> 
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
> 
I have sent v3.

Thanks!

> Thanks!


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-30  3:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-21 11:52 [PATCH v2] net/mlx5e: Fix a race in command alloc flow Shifeng Li
2023-11-22 12:02 ` Leon Romanovsky
2023-11-23  2:20   ` Shifeng Li
2023-11-23 12:45   ` Shifeng Li
2023-11-26 16:13 ` Moshe Shemesh
2023-11-30  3:10   ` Shifeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox