Re: [PATCH] RDMA/mlx5: reduce stack usage in mlx5_ib_ufile_hw_cleanup

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

From: Patrisious Haddad <phaddad@nvidia.com>
To: Arnd Bergmann <arnd@kernel.org>,
	Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: "Arnd Bergmann" <arnd@arndb.de>,
	"Christian Göttsche" <cgzones@googlemail.com>,
	"Serge Hallyn" <serge@hallyn.com>,
	"Chiara Meiohas" <cmeiohas@nvidia.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] RDMA/mlx5: reduce stack usage in mlx5_ib_ufile_hw_cleanup
Date: Tue, 10 Jun 2025 12:50:57 +0300	[thread overview]
Message-ID: <fa916ae4-1ed3-4f90-8577-3666ff0fe84a@nvidia.com> (raw)
In-Reply-To: <20250610092846.2642535-1-arnd@kernel.org>


On 6/10/2025 12:28 PM, Arnd Bergmann wrote:
> External email: Use caution opening links or attachments
>
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> This function has an array of eight mlx5_async_cmd structures, which
> often fits on the stack, but depending on the configuration can
> end up blowing the stack frame warning limit:
>
> drivers/infiniband/hw/mlx5/devx.c:2670:6: error: stack frame size (1392) exceeds limit (1280) in 'mlx5_ib_ufile_hw_cleanup' [-Werror,-Wframe-larger-than]
>
> Change this to a dynamic allocation instead. While a kmalloc()
> can theoretically fail, a GFP_KERNEL allocation under a page will
> block until memory has been freed up, so in the worst case, this
> only adds extra time in an already constrained environment.
>
> Fixes: 7c891a4dbcc1 ("RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>   drivers/infiniband/hw/mlx5/devx.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> index 2479da8620ca..c3c0ea219ab7 100644
> --- a/drivers/infiniband/hw/mlx5/devx.c
> +++ b/drivers/infiniband/hw/mlx5/devx.c
> @@ -2669,7 +2669,7 @@ static void devx_wait_async_destroy(struct mlx5_async_cmd *cmd)
>
>   void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
>   {
> -       struct mlx5_async_cmd async_cmd[MAX_ASYNC_CMDS];
> +       struct mlx5_async_cmd *async_cmd;
Please preserve reverse Christmas tree deceleration.
>          struct ib_ucontext *ucontext = ufile->ucontext;
>          struct ib_device *device = ucontext->device;
>          struct mlx5_ib_dev *dev = to_mdev(device);
> @@ -2678,6 +2678,10 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
>          int head = 0;
>          int tail = 0;
>
> +       async_cmd = kcalloc(MAX_ASYNC_CMDS, sizeof(*async_cmd), GFP_KERNEL);
> +       if (WARN_ON(!async_cmd))
> +               return;

But honestly I'm not sure I like this, the whole point of this patch was 
performance optimization for teardown flow, and this function is called 
in a loop not even one time.

So I'm really not sure about how much kcalloc can slow it down here, and 
it failing is whole other issue.


I'm thinking out-loud here, but theoretically we know stack size and 
this struct size at compile time , so can we should be able to add some 
kind of ifdef check "if (stack_frame_size < struct_size)" skip this 
function and maybe print some warning.
(since it is purely optimization function and logically the code will 
continue correctly without it - but if it needs to be executed then let 
it stay like this and needs a big enough stack - which is most of today 
systems anyway) ?

> +
>          list_for_each_entry(uobject, &ufile->uobjects, list) {
>                  WARN_ON(uverbs_try_lock_object(uobject, UVERBS_LOOKUP_WRITE));
>
> @@ -2713,6 +2717,8 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
>                  devx_wait_async_destroy(&async_cmd[head % MAX_ASYNC_CMDS]);
>                  head++;
>          }
> +
> +       kfree(async_cmd);
>   }
>
>   static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf,
> --
> 2.39.5
>

next prev parent reply	other threads:[~2025-06-10  9:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-10  9:28 [PATCH] RDMA/mlx5: reduce stack usage in mlx5_ib_ufile_hw_cleanup Arnd Bergmann
2025-06-10  9:50 ` Patrisious Haddad [this message]
2025-06-10 10:31   ` Arnd Bergmann
2025-06-10 14:51     ` Patrisious Haddad
2025-06-12  9:05 ` Leon Romanovsky
2025-06-12  9:05 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa916ae4-1ed3-4f90-8577-3666ff0fe84a@nvidia.com \
    --to=phaddad@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=cgzones@googlemail.com \
    --cc=cmeiohas@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox