From: Saeed Mahameed <saeed@kernel.org>
To: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
Saeed Mahameed <saeedm@nvidia.com>,
netdev@vger.kernel.org, Tariq Toukan <tariqt@nvidia.com>,
Shay Drory <shayd@nvidia.com>, Moshe Shemesh <moshe@nvidia.com>,
Leon Romanovsky <leon@kernel.org>
Subject: Re: [RFC PATCH net] net/mlx5: Perform DMA operations cleanup before pci_disable_device()
Date: Thu, 12 Oct 2023 10:03:41 -0700 [thread overview]
Message-ID: <ZSgm7UtdVnr3OojW@x130> (raw)
In-Reply-To: <f93947daffa56e4cdf380ad644e78bcee1ad4183.camel@linux.ibm.com>
On 12 Oct 13:51, Niklas Schnelle wrote:
>On Wed, 2023-10-11 at 11:45 -0700, Saeed Mahameed wrote:
>> From: Shay Drory <shayd@nvidia.com>
>>
>> The cited patch change mlx5 driver so that during probe, DMA
>> operations were performed before pci_enable_device() and during
>> teardown, DMA operations were performed after pci_disable_device().
>> DMA operations require PCI to be enabled. Hence, The above leads to
>> the following oops in PPC systems[2].
>>
>> Fix it by performing the DMA operations during probe, after
>> pci_enable_device() and during teardown, before pci_disable_device().
>>
>> This also fixes a problem reported by Niklas Schnelle [1]
>>
>> [1] https://lore.kernel.org/lkml/20231011-mlx5_init_fix-v3-1-787ffb9183c6@linux.ibm.com/
>>
>> [2]
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>> Modules linked in: xt_MASQUERADE nf_conntrack_netlink
>> nfnetlink xfrm_user iptable_nat xt_addrtype xt_conntrack nf_nat
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 netconsole rpcsec_gss_krb5
>> auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser ib_umad
>> rdma_cm ib_ipoib iw_cm libiscsi scsi_transport_iscsi ib_cm ib_uverbs
>> ib_core mlx5_core(-) ptp pps_core fuse vmx_crypto crc32c_vpmsum [last
>> unloaded: mlx5_ib]
>> CPU: 1 PID: 8937 Comm: modprobe Not tainted 6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02 #1
>> Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
>> NIP: c000000000423388 LR: c0000000001e733c CTR: c0000000001e4720
>> REGS: c0000000055636d0 TRAP: 0380 Not tainted (6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02)
>> MSR: 8000000000009033 CR: 24008884 XER: 20040000
>> CFAR: c0000000001e7338 IRQMASK: 0
>> NIP [c000000000423388] __free_pages+0x28/0x160
>> LR [c0000000001e733c] dma_direct_free+0xac/0x190
>> Call Trace:
>> [c000000005563970] [5deadbeef0000100] 0x5deadbeef0000100 (unreliable)
>> [c0000000055639b0] [c0000000003d46cc] kfree+0x7c/0x150
>> [c000000005563a40] [c0000000001e47c8] dma_free_attrs+0xa8/0x1a0
>> [c000000005563aa0] [c008000000d0064c] mlx5_cmd_cleanup+0xa4/0x100 [mlx5_core]
>> [c000000005563ad0] [c008000000cf629c] mlx5_mdev_uninit+0xf4/0x140 [mlx5_core]
>> [c000000005563b00] [c008000000cf6448] remove_one+0x160/0x1d0 [mlx5_core]
>> [c000000005563b40] [c000000000958540] pci_device_remove+0x60/0x110
>> [c000000005563b80] [c000000000a35e80] device_remove+0x70/0xd0
>> [c000000005563bb0] [c000000000a37a38] device_release_driver_internal+0x2a8/0x330
>> [c000000005563c00] [c000000000a37b8c] driver_detach+0x8c/0x160
>> [c000000005563c40] [c000000000a35350] bus_remove_driver+0x90/0x110
>> [c000000005563c80] [c000000000a38948] driver_unregister+0x48/0x90
>> [c000000005563cf0] [c000000000957e38] pci_unregister_driver+0x38/0x150
>> [c000000005563d40] [c008000000eb6140] mlx5_cleanup+0x38/0x90 [mlx5_core]
>>
>> Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines")
>> Signed-off-by: Shay Drory <shayd@nvidia.com>
>> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>> CC: Leon Romanovsky <leon@kernel.org>
>> CC: Niklas Schnelle <schnelle@linux.ibm.com>
>>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 62 ++++++++-----------
>> 1 file changed, 27 insertions(+), 35 deletions(-)
>
>
>I can confirm that this indeed fixes the problem I was seeing. I also
>tested hot unplug and re-plug as well as some smoke tests with the
>devices at hand across multiple ConnectX generations.
>So feel free to add my:
>
>Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
>> index afb348579577..dd36d9cba62f 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
>> @@ -2186,52 +2186,23 @@ static u16 cmdif_rev(struct mlx5_core_dev *dev)
>>
>> int mlx5_cmd_init(struct mlx5_core_dev *dev)
>> {
>> - int size = sizeof(struct mlx5_cmd_prot_block);
>> - int align = roundup_pow_of_two(size);
>> struct mlx5_cmd *cmd = &dev->cmd;
>> - u32 cmd_l;
>> - int err;
>> -
>> - cmd->pool = dma_pool_create("mlx5_cmd", mlx5_core_dma_dev(dev), size, align, 0);
>> - if (!cmd->pool)
>> - return -ENOMEM;
>>
>> - err = alloc_cmd_page(dev, cmd);
>> - if (err)
>> - goto err_free_pool;
>> -
>> - cmd_l = (u32)(cmd->dma);
>> - if (cmd_l & 0xfff) {
>> - mlx5_core_err(dev, "invalid command queue address\n");
>> - err = -ENOMEM;
>> - goto err_cmd_page;
>> - }
>> cmd->checksum_disabled = 1;
>>
>> spin_lock_init(&cmd->alloc_lock);
>> spin_lock_init(&cmd->token_lock);
>>
>> - create_msg_cache(dev);
>> -
>> set_wqname(dev);
>> cmd->wq = create_singlethread_workqueue(cmd->wq_name);
>> if (!cmd->wq) {
>> mlx5_core_err(dev, "failed to create command workqueue\n");
>> - err = -ENOMEM;
>> - goto err_cache;
>> + return -ENOMEM;
>> }
>>
>> mlx5_cmdif_debugfs_init(dev);
>>
>> return 0;
>> -
>> -err_cache:
>> - destroy_msg_cache(dev);
>> -err_cmd_page:
>> - free_cmd_page(dev, cmd);
>> -err_free_pool:
>> - dma_pool_destroy(cmd->pool);
>> - return err;
>> }
>
>I like that this leaves mlx5_cmd_init() simpler.
>
>>
>> void mlx5_cmd_cleanup(struct mlx5_core_dev *dev)
>> @@ -2240,15 +2211,15 @@ void mlx5_cmd_cleanup(struct mlx5_core_dev *dev)
>>
>> mlx5_cmdif_debugfs_cleanup(dev);
>> destroy_workqueue(cmd->wq);
>> - destroy_msg_cache(dev);
>> - free_cmd_page(dev, cmd);
>> - dma_pool_destroy(cmd->pool);
>> }
>>
>> int mlx5_cmd_enable(struct mlx5_core_dev *dev)
>> {
>> + int size = sizeof(struct mlx5_cmd_prot_block);
>> + int align = roundup_pow_of_two(size);
>> struct mlx5_cmd *cmd = &dev->cmd;
>> u32 cmd_h, cmd_l;
>> + int err;
>>
>> memset(&cmd->vars, 0, sizeof(cmd->vars));
>> cmd->vars.cmdif_rev = cmdif_rev(dev);
>> @@ -2281,10 +2252,21 @@ int mlx5_cmd_enable(struct mlx5_core_dev *dev)
>> sema_init(&cmd->vars.pages_sem, 1);
>> sema_init(&cmd->vars.throttle_sem, DIV_ROUND_UP(cmd->vars.max_reg_cmds, 2));
>>
>> + cmd->pool = dma_pool_create("mlx5_cmd", mlx5_core_dma_dev(dev), size, align, 0);
>> + if (!cmd->pool)
>> + return -ENOMEM;
>> +
>> + err = alloc_cmd_page(dev, cmd);
>> + if (err)
>> + goto err_free_pool;
>> +
>> cmd_h = (u32)((u64)(cmd->dma) >> 32);
>> cmd_l = (u32)(cmd->dma);
>> - if (WARN_ON(cmd_l & 0xfff))
>> - return -EINVAL;
>> + if (cmd_l & 0xfff) {
>> + mlx5_core_err(dev, "invalid command queue address\n");
>> + err = -ENOMEM;
>> + goto err_cmd_page;
>> + }
>>
>> iowrite32be(cmd_h, &dev->iseg->cmdq_addr_h);
>> iowrite32be(cmd_l, &dev->iseg->cmdq_addr_l_sz);
>> @@ -2297,16 +2279,26 @@ int mlx5_cmd_enable(struct mlx5_core_dev *dev)
>> cmd->mode = CMD_MODE_POLLING;
>> cmd->allowed_opcode = CMD_ALLOWED_OPCODE_ALL;
>>
>> + create_msg_cache(dev);
>> create_debugfs_files(dev);
>>
>> return 0;
>> +
>> +err_cmd_page:
>> + free_cmd_page(dev, cmd);
>> +err_free_pool:
>> + dma_pool_destroy(cmd->pool);
>> + return err;
>> }
>>
>> void mlx5_cmd_disable(struct mlx5_core_dev *dev)
>> {
>> struct mlx5_cmd *cmd = &dev->cmd;
>>
>> + destroy_msg_cache(dev);
>> clean_debug_files(dev);
>> + free_cmd_page(dev, cmd);
>> + dma_pool_destroy(cmd->pool);
>> flush_workqueue(cmd->wq);
>> }
>
>I do like that this tears down the DMA stuff before
>pci_disable_device() I don't think this fixes a problem on s390x but to
>me it's the more natural order. That said, are we sure that the
>flush_workqueue() won't still need DMA and cmd page?
>
Agreed, tt is safer to flush_workqueue first thing in the function.
>>
>> --
>> 2.41.0
>>
>
prev parent reply other threads:[~2023-10-12 17:03 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-11 18:45 [RFC PATCH net] net/mlx5: Perform DMA operations cleanup before pci_disable_device() Saeed Mahameed
2023-10-12 11:51 ` Niklas Schnelle
2023-10-12 17:03 ` Saeed Mahameed [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZSgm7UtdVnr3OojW@x130 \
--to=saeed@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=schnelle@linux.ibm.com \
--cc=shayd@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.