* [PATCH] RDMA/siw: work around clang stack size warning
@ 2025-06-20 11:43 Arnd Bergmann
2025-06-21 4:12 ` Zhu Yanjun
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Arnd Bergmann @ 2025-06-20 11:43 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Leon Romanovsky,
Nathan Chancellor
Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
Potnuri Bharat Teja, Showrya M N, Eric Biggers, linux-rdma,
linux-kernel, llvm
From: Arnd Bergmann <arnd@arndb.de>
clang inlines a lot of functions into siw_qp_sq_process(), with the
aggregate stack frame blowing the warning limit in some configurations:
drivers/infiniband/sw/siw/siw_qp_tx.c:1014:5: error: stack frame size (1544) exceeds limit (1280) in 'siw_qp_sq_process' [-Werror,-Wframe-larger-than]
The real problem here is the array of kvec structures in siw_tx_hdt that
makes up the majority of the consumed stack space.
Ideally there would be a way to avoid allocating the array on the
stack, but that would require a larger rework. Add a noinline_for_stack
annotation to avoid the warning for now, and make clang behave the same
way as gcc here. The combined stack usage is still similar, but is spread
over multiple functions now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/infiniband/sw/siw/siw_qp_tx.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 6432bce7d083..3a08f57d2211 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -277,6 +277,15 @@ static int siw_qp_prepare_tx(struct siw_iwarp_tx *c_tx)
return PKT_FRAGMENTED;
}
+static noinline_for_stack int
+siw_sendmsg(struct socket *sock, unsigned int msg_flags,
+ struct kvec *vec, size_t num, size_t len)
+{
+ struct msghdr msg = { .msg_flags = msg_flags };
+
+ return kernel_sendmsg(sock, &msg, vec, num, len);
+}
+
/*
* Send out one complete control type FPDU, or header of FPDU carrying
* data. Used for fixed sized packets like Read.Requests or zero length
@@ -285,12 +294,11 @@ static int siw_qp_prepare_tx(struct siw_iwarp_tx *c_tx)
static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s,
int flags)
{
- struct msghdr msg = { .msg_flags = flags };
struct kvec iov = { .iov_base =
(char *)&c_tx->pkt.ctrl + c_tx->ctrl_sent,
.iov_len = c_tx->ctrl_len - c_tx->ctrl_sent };
- int rv = kernel_sendmsg(s, &msg, &iov, 1, iov.iov_len);
+ int rv = siw_sendmsg(s, flags, &iov, 1, iov.iov_len);
if (rv >= 0) {
c_tx->ctrl_sent += rv;
@@ -427,13 +435,13 @@ static void siw_unmap_pages(struct kvec *iov, unsigned long kmap_mask, int len)
* Write out iov referencing hdr, data and trailer of current FPDU.
* Update transmit state dependent on write return status
*/
-static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
+static noinline_for_stack int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
+ struct socket *s)
{
struct siw_wqe *wqe = &c_tx->wqe_active;
struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
struct kvec iov[MAX_ARRAY];
struct page *page_array[MAX_ARRAY];
- struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };
int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len = 0,
@@ -586,14 +594,16 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
rv = siw_0copy_tx(s, page_array, &wqe->sqe.sge[c_tx->sge_idx],
c_tx->sge_off, data_len);
if (rv == data_len) {
- rv = kernel_sendmsg(s, &msg, &iov[seg], 1, trl_len);
+
+ rv = siw_sendmsg(s, MSG_DONTWAIT | MSG_EOR, &iov[seg],
+ 1, trl_len);
if (rv > 0)
rv += data_len;
else
rv = data_len;
}
} else {
- rv = kernel_sendmsg(s, &msg, iov, seg + 1,
+ rv = siw_sendmsg(s, MSG_DONTWAIT | MSG_EOR, iov, seg + 1,
hdr_len + data_len + trl_len);
siw_unmap_pages(iov, kmap_mask, seg);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA/siw: work around clang stack size warning
2025-06-20 11:43 [PATCH] RDMA/siw: work around clang stack size warning Arnd Bergmann
@ 2025-06-21 4:12 ` Zhu Yanjun
2025-06-21 8:43 ` Arnd Bergmann
2025-06-24 10:47 ` Bernard Metzler
2025-06-25 11:14 ` Leon Romanovsky
2 siblings, 1 reply; 7+ messages in thread
From: Zhu Yanjun @ 2025-06-21 4:12 UTC (permalink / raw)
To: Arnd Bergmann, Bernard Metzler, Jason Gunthorpe, Leon Romanovsky,
Nathan Chancellor
Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
Potnuri Bharat Teja, Showrya M N, Eric Biggers, linux-rdma,
linux-kernel, llvm
在 2025/6/20 4:43, Arnd Bergmann 写道:
> From: Arnd Bergmann <arnd@arndb.de>
>
> clang inlines a lot of functions into siw_qp_sq_process(), with the
> aggregate stack frame blowing the warning limit in some configurations:
>
> drivers/infiniband/sw/siw/siw_qp_tx.c:1014:5: error: stack frame size (1544) exceeds limit (1280) in 'siw_qp_sq_process' [-Werror,-Wframe-larger-than]
>
> The real problem here is the array of kvec structures in siw_tx_hdt that
> makes up the majority of the consumed stack space.
Because the array of kvec structures in siw_tx_hdt consumes the majority
of the stack space, would it be possible to use kmalloc or a similar
dynamic memory allocation function instead of allocating this memory on
the stack?
Would using kmalloc (or an equivalent) also effectively resolve the
stack usage issue?
Please note that I’m not questioning the value of this commit—I’m simply
curious whether there might be an alternative solution to the problem.
Thanks,
Yanjun.Zhu
>
> Ideally there would be a way to avoid allocating the array on the
> stack, but that would require a larger rework. Add a noinline_for_stack
> annotation to avoid the warning for now, and make clang behave the same
> way as gcc here. The combined stack usage is still similar, but is spread
> over multiple functions now.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
> index 6432bce7d083..3a08f57d2211 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_tx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
> @@ -277,6 +277,15 @@ static int siw_qp_prepare_tx(struct siw_iwarp_tx *c_tx)
> return PKT_FRAGMENTED;
> }
>
> +static noinline_for_stack int
> +siw_sendmsg(struct socket *sock, unsigned int msg_flags,
> + struct kvec *vec, size_t num, size_t len)
> +{
> + struct msghdr msg = { .msg_flags = msg_flags };
> +
> + return kernel_sendmsg(sock, &msg, vec, num, len);
> +}
> +
> /*
> * Send out one complete control type FPDU, or header of FPDU carrying
> * data. Used for fixed sized packets like Read.Requests or zero length
> @@ -285,12 +294,11 @@ static int siw_qp_prepare_tx(struct siw_iwarp_tx *c_tx)
> static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s,
> int flags)
> {
> - struct msghdr msg = { .msg_flags = flags };
> struct kvec iov = { .iov_base =
> (char *)&c_tx->pkt.ctrl + c_tx->ctrl_sent,
> .iov_len = c_tx->ctrl_len - c_tx->ctrl_sent };
>
> - int rv = kernel_sendmsg(s, &msg, &iov, 1, iov.iov_len);
> + int rv = siw_sendmsg(s, flags, &iov, 1, iov.iov_len);
>
> if (rv >= 0) {
> c_tx->ctrl_sent += rv;
> @@ -427,13 +435,13 @@ static void siw_unmap_pages(struct kvec *iov, unsigned long kmap_mask, int len)
> * Write out iov referencing hdr, data and trailer of current FPDU.
> * Update transmit state dependent on write return status
> */
> -static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
> +static noinline_for_stack int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> + struct socket *s)
> {
> struct siw_wqe *wqe = &c_tx->wqe_active;
> struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
> struct kvec iov[MAX_ARRAY];
> struct page *page_array[MAX_ARRAY];
> - struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };
>
> int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
> unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len = 0,
> @@ -586,14 +594,16 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
> rv = siw_0copy_tx(s, page_array, &wqe->sqe.sge[c_tx->sge_idx],
> c_tx->sge_off, data_len);
> if (rv == data_len) {
> - rv = kernel_sendmsg(s, &msg, &iov[seg], 1, trl_len);
> +
> + rv = siw_sendmsg(s, MSG_DONTWAIT | MSG_EOR, &iov[seg],
> + 1, trl_len);
> if (rv > 0)
> rv += data_len;
> else
> rv = data_len;
> }
> } else {
> - rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> + rv = siw_sendmsg(s, MSG_DONTWAIT | MSG_EOR, iov, seg + 1,
> hdr_len + data_len + trl_len);
> siw_unmap_pages(iov, kmap_mask, seg);
> }
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA/siw: work around clang stack size warning
2025-06-21 4:12 ` Zhu Yanjun
@ 2025-06-21 8:43 ` Arnd Bergmann
2025-06-21 16:09 ` Zhu Yanjun
2025-06-22 13:29 ` Bernard Metzler
0 siblings, 2 replies; 7+ messages in thread
From: Arnd Bergmann @ 2025-06-21 8:43 UTC (permalink / raw)
To: Zhu Yanjun, Arnd Bergmann, Bernard Metzler, Jason Gunthorpe,
Leon Romanovsky, Nathan Chancellor
Cc: Nick Desaulniers, Bill Wendling, Justin Stitt,
Potnuri Bharat Teja, Showrya M N, Eric Biggers, linux-rdma,
linux-kernel, llvm
On Sat, Jun 21, 2025, at 06:12, Zhu Yanjun wrote:
> 在 2025/6/20 4:43, Arnd Bergmann 写道:
>
> Because the array of kvec structures in siw_tx_hdt consumes the majority
> of the stack space, would it be possible to use kmalloc or a similar
> dynamic memory allocation function instead of allocating this memory on
> the stack?
>
> Would using kmalloc (or an equivalent) also effectively resolve the
> stack usage issue?
Yes, moving the allocation somewhere else (kmalloc, static variable,
per siw_sge, per siw_wqe) would avoid the high stack usage effectively,
it's a tradeoff and I picked the solution that made the most sense
to me, but there is a good chance another alternative is better here.
The main differences are:
- kmalloc() adds runtime overhead that may be expensive in a
fast path
- kmalloc() can fail, which adds complexity from error handling.
Note that small allocations with GFP_KERNEL do not fail but instead
wait for memory to become available
- If kmalloc() runs into a low-memory situation, it can go through
writeback, which in turn can use more stack space than the
on-stack allocation it was replacing
- static allocations bloat the kernel image and require locking that
may be expensive
- per-object preallocations can be wasteful if a lot of objects
are created, and can still require locking if the object is used
from multiple threads
As I wrote, I mainly picked the 'noinline_for_stack' approach
here since that is how the code is known to work with gcc, so
there is little risk of my patch causing problems.
Moving the both the kvec array and the page array into
the siw_wqe is likely better here, I'm not familiar enough
with the driver to tell whether that is an overall improvement.
A related change I would like to see is to remove the
kmap_local_page() in this driver and instead make it
depend on 64BIT or !CONFIG_HIGHMEM, to slowly chip away
at the code that is highmem aware throughout the kernel.
I'm not sure if that that would also help drop the array
here.
Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA/siw: work around clang stack size warning
2025-06-21 8:43 ` Arnd Bergmann
@ 2025-06-21 16:09 ` Zhu Yanjun
2025-06-22 13:29 ` Bernard Metzler
1 sibling, 0 replies; 7+ messages in thread
From: Zhu Yanjun @ 2025-06-21 16:09 UTC (permalink / raw)
To: Arnd Bergmann, Arnd Bergmann, Bernard Metzler, Jason Gunthorpe,
Leon Romanovsky, Nathan Chancellor
Cc: Nick Desaulniers, Bill Wendling, Justin Stitt,
Potnuri Bharat Teja, Showrya M N, Eric Biggers, linux-rdma,
linux-kernel, llvm
在 2025/6/21 1:43, Arnd Bergmann 写道:
> On Sat, Jun 21, 2025, at 06:12, Zhu Yanjun wrote:
>> 在 2025/6/20 4:43, Arnd Bergmann 写道:
>>
>> Because the array of kvec structures in siw_tx_hdt consumes the majority
>> of the stack space, would it be possible to use kmalloc or a similar
>> dynamic memory allocation function instead of allocating this memory on
>> the stack?
>>
>> Would using kmalloc (or an equivalent) also effectively resolve the
>> stack usage issue?
> Yes, moving the allocation somewhere else (kmalloc, static variable,
> per siw_sge, per siw_wqe) would avoid the high stack usage effectively,
> it's a tradeoff and I picked the solution that made the most sense
> to me, but there is a good chance another alternative is better here.
>
> The main differences are:
>
> - kmalloc() adds runtime overhead that may be expensive in a
> fast path
>
> - kmalloc() can fail, which adds complexity from error handling.
> Note that small allocations with GFP_KERNEL do not fail but instead
> wait for memory to become available
>
> - If kmalloc() runs into a low-memory situation, it can go through
> writeback, which in turn can use more stack space than the
> on-stack allocation it was replacing
>
> - static allocations bloat the kernel image and require locking that
> may be expensive
>
> - per-object preallocations can be wasteful if a lot of objects
> are created, and can still require locking if the object is used
> from multiple threads
>
> As I wrote, I mainly picked the 'noinline_for_stack' approach
> here since that is how the code is known to work with gcc, so
> there is little risk of my patch causing problems.
>
> Moving the both the kvec array and the page array into
> the siw_wqe is likely better here, I'm not familiar enough
> with the driver to tell whether that is an overall improvement.Th
Thank you very much. There are several possible solutions to this issue,
and the appropriate one depends on the specific scenario. For the
problem in siw, the noinline_for_stack approach has been selected. In my
opinion, this appears to be more of a workaround than a true fix. While
it does mitigate the issue, the underlying problem in siw still remains.
That said, now that we have a clearer understanding of the problem and
its root cause through discussions and extended analysis, a more robust
and long-term solution should eventually be proposed.
Thanks,
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
>
> A related change I would like to see is to remove the
> kmap_local_page() in this driver and instead make it
> depend on 64BIT or !CONFIG_HIGHMEM, to slowly chip away
> at the code that is highmem aware throughout the kernel.
> I'm not sure if that that would also help drop the array
> here.
>
> Arnd
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] RDMA/siw: work around clang stack size warning
2025-06-21 8:43 ` Arnd Bergmann
2025-06-21 16:09 ` Zhu Yanjun
@ 2025-06-22 13:29 ` Bernard Metzler
1 sibling, 0 replies; 7+ messages in thread
From: Bernard Metzler @ 2025-06-22 13:29 UTC (permalink / raw)
To: Arnd Bergmann, Zhu Yanjun, Arnd Bergmann, Jason Gunthorpe,
Leon Romanovsky, Nathan Chancellor
Cc: Nick Desaulniers, Bill Wendling, Justin Stitt,
Potnuri Bharat Teja, Showrya M N, Eric Biggers,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
llvm@lists.linux.dev
> -----Original Message-----
> From: Arnd Bergmann <arnd@arndb.de>
> Sent: Saturday, June 21, 2025 10:43 AM
> To: Zhu Yanjun <yanjun.zhu@linux.dev>; Arnd Bergmann <arnd@kernel.org>;
> Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe <jgg@ziepe.ca>; Leon
> Romanovsky <leon@kernel.org>; Nathan Chancellor <nathan@kernel.org>
> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>; Bill Wendling
> <morbo@google.com>; Justin Stitt <justinstitt@google.com>; Potnuri Bharat
> Teja <bharat@chelsio.com>; Showrya M N <showrya@chelsio.com>; Eric Biggers
> <ebiggers@google.com>; linux-rdma@vger.kernel.org; linux-
> kernel@vger.kernel.org; llvm@lists.linux.dev
> Subject: [EXTERNAL] Re: [PATCH] RDMA/siw: work around clang stack size
> warning
>
> On Sat, Jun 21, 2025, at 06:12, Zhu Yanjun wrote:
> > 在 2025/6/20 4:43, Arnd Bergmann 写道:
> >
> > Because the array of kvec structures in siw_tx_hdt consumes the majority
> > of the stack space, would it be possible to use kmalloc or a similar
> > dynamic memory allocation function instead of allocating this memory on
> > the stack?
> >
> > Would using kmalloc (or an equivalent) also effectively resolve the
> > stack usage issue?
>
> Yes, moving the allocation somewhere else (kmalloc, static variable,
> per siw_sge, per siw_wqe) would avoid the high stack usage effectively,
> it's a tradeoff and I picked the solution that made the most sense
> to me, but there is a good chance another alternative is better here.
>
> The main differences are:
>
> - kmalloc() adds runtime overhead that may be expensive in a
> fast path
doing kmalloc in the fast data send path is what I clearly wanted
to avoid. The current code is a performance optimization which tries
sending the complete iwarp packet in one kernel_sendmsg() call.
A packet may comprise multiple pages referencing user data of
multiple SGE's to be send plus a packet header and a trailer CRC.
The array size reflects the maximum number of packet fragments
possible.
In the long run, I shall refactor that code to avoid the issue.
I appreciate Arnd's fix for now. I'll test and come back soon.
Many thanks,
Bernard
>
> - kmalloc() can fail, which adds complexity from error handling.
> Note that small allocations with GFP_KERNEL do not fail but instead
> wait for memory to become available
>
> - If kmalloc() runs into a low-memory situation, it can go through
> writeback, which in turn can use more stack space than the
> on-stack allocation it was replacing
>
> - static allocations bloat the kernel image and require locking that
> may be expensive
>
> - per-object preallocations can be wasteful if a lot of objects
> are created, and can still require locking if the object is used
> from multiple threads
>
> As I wrote, I mainly picked the 'noinline_for_stack' approach
> here since that is how the code is known to work with gcc, so
> there is little risk of my patch causing problems.
>
> Moving the both the kvec array and the page array into
> the siw_wqe is likely better here, I'm not familiar enough
> with the driver to tell whether that is an overall improvement.
>
> A related change I would like to see is to remove the
> kmap_local_page() in this driver and instead make it
> depend on 64BIT or !CONFIG_HIGHMEM, to slowly chip away
> at the code that is highmem aware throughout the kernel.
> I'm not sure if that that would also help drop the array
> here.
>
> Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] RDMA/siw: work around clang stack size warning
2025-06-20 11:43 [PATCH] RDMA/siw: work around clang stack size warning Arnd Bergmann
2025-06-21 4:12 ` Zhu Yanjun
@ 2025-06-24 10:47 ` Bernard Metzler
2025-06-25 11:14 ` Leon Romanovsky
2 siblings, 0 replies; 7+ messages in thread
From: Bernard Metzler @ 2025-06-24 10:47 UTC (permalink / raw)
To: Arnd Bergmann, Jason Gunthorpe, Leon Romanovsky,
Nathan Chancellor
Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
Potnuri Bharat Teja, Showrya M N, Eric Biggers,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
llvm@lists.linux.dev
> -----Original Message-----
> From: Arnd Bergmann <arnd@kernel.org>
> Sent: Friday, June 20, 2025 1:43 PM
> To: Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe <jgg@ziepe.ca>;
> Leon Romanovsky <leon@kernel.org>; Nathan Chancellor <nathan@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>; Nick Desaulniers
> <nick.desaulniers+lkml@gmail.com>; Bill Wendling <morbo@google.com>; Justin
> Stitt <justinstitt@google.com>; Potnuri Bharat Teja <bharat@chelsio.com>;
> Showrya M N <showrya@chelsio.com>; Eric Biggers <ebiggers@google.com>;
> linux-rdma@vger.kernel.org; linux-kernel@vger.kernel.org;
> llvm@lists.linux.dev
> Subject: [EXTERNAL] [PATCH] RDMA/siw: work around clang stack size warning
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> clang inlines a lot of functions into siw_qp_sq_process(), with the
> aggregate stack frame blowing the warning limit in some configurations:
>
> drivers/infiniband/sw/siw/siw_qp_tx.c:1014:5: error: stack frame size
> (1544) exceeds limit (1280) in 'siw_qp_sq_process' [-Werror,-Wframe-larger-
> than]
>
> The real problem here is the array of kvec structures in siw_tx_hdt that
> makes up the majority of the consumed stack space.
>
> Ideally there would be a way to avoid allocating the array on the
> stack, but that would require a larger rework. Add a noinline_for_stack
> annotation to avoid the warning for now, and make clang behave the same
> way as gcc here. The combined stack usage is still similar, but is spread
> over multiple functions now.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
> b/drivers/infiniband/sw/siw/siw_qp_tx.c
> index 6432bce7d083..3a08f57d2211 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_tx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
> @@ -277,6 +277,15 @@ static int siw_qp_prepare_tx(struct siw_iwarp_tx
> *c_tx)
> return PKT_FRAGMENTED;
> }
>
> +static noinline_for_stack int
> +siw_sendmsg(struct socket *sock, unsigned int msg_flags,
> + struct kvec *vec, size_t num, size_t len)
> +{
> + struct msghdr msg = { .msg_flags = msg_flags };
> +
> + return kernel_sendmsg(sock, &msg, vec, num, len);
> +}
> +
> /*
> * Send out one complete control type FPDU, or header of FPDU carrying
> * data. Used for fixed sized packets like Read.Requests or zero length
> @@ -285,12 +294,11 @@ static int siw_qp_prepare_tx(struct siw_iwarp_tx
> *c_tx)
> static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s,
> int flags)
> {
> - struct msghdr msg = { .msg_flags = flags };
> struct kvec iov = { .iov_base =
> (char *)&c_tx->pkt.ctrl + c_tx->ctrl_sent,
> .iov_len = c_tx->ctrl_len - c_tx->ctrl_sent };
>
> - int rv = kernel_sendmsg(s, &msg, &iov, 1, iov.iov_len);
> + int rv = siw_sendmsg(s, flags, &iov, 1, iov.iov_len);
>
> if (rv >= 0) {
> c_tx->ctrl_sent += rv;
> @@ -427,13 +435,13 @@ static void siw_unmap_pages(struct kvec *iov,
> unsigned long kmap_mask, int len)
> * Write out iov referencing hdr, data and trailer of current FPDU.
> * Update transmit state dependent on write return status
> */
> -static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
> +static noinline_for_stack int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> + struct socket *s)
> {
> struct siw_wqe *wqe = &c_tx->wqe_active;
> struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
> struct kvec iov[MAX_ARRAY];
> struct page *page_array[MAX_ARRAY];
> - struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };
>
> int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
> unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len = 0,
> @@ -586,14 +594,16 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> struct socket *s)
> rv = siw_0copy_tx(s, page_array, &wqe->sqe.sge[c_tx->sge_idx],
> c_tx->sge_off, data_len);
> if (rv == data_len) {
> - rv = kernel_sendmsg(s, &msg, &iov[seg], 1, trl_len);
> +
> + rv = siw_sendmsg(s, MSG_DONTWAIT | MSG_EOR, &iov[seg],
> + 1, trl_len);
> if (rv > 0)
> rv += data_len;
> else
> rv = data_len;
> }
> } else {
> - rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> + rv = siw_sendmsg(s, MSG_DONTWAIT | MSG_EOR, iov, seg + 1,
> hdr_len + data_len + trl_len);
> siw_unmap_pages(iov, kmap_mask, seg);
> }
> --
> 2.39.5
looks good and was tested.
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA/siw: work around clang stack size warning
2025-06-20 11:43 [PATCH] RDMA/siw: work around clang stack size warning Arnd Bergmann
2025-06-21 4:12 ` Zhu Yanjun
2025-06-24 10:47 ` Bernard Metzler
@ 2025-06-25 11:14 ` Leon Romanovsky
2 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2025-06-25 11:14 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe, Nathan Chancellor,
Arnd Bergmann
Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
Potnuri Bharat Teja, Showrya M N, Eric Biggers, linux-rdma,
linux-kernel, llvm
On Fri, 20 Jun 2025 13:43:28 +0200, Arnd Bergmann wrote:
> clang inlines a lot of functions into siw_qp_sq_process(), with the
> aggregate stack frame blowing the warning limit in some configurations:
>
> drivers/infiniband/sw/siw/siw_qp_tx.c:1014:5: error: stack frame size (1544) exceeds limit (1280) in 'siw_qp_sq_process' [-Werror,-Wframe-larger-than]
>
> The real problem here is the array of kvec structures in siw_tx_hdt that
> makes up the majority of the consumed stack space.
>
> [...]
Applied, thanks!
[1/1] RDMA/siw: work around clang stack size warning
https://git.kernel.org/rdma/rdma/c/842cf5a6e35656
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-06-25 11:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20 11:43 [PATCH] RDMA/siw: work around clang stack size warning Arnd Bergmann
2025-06-21 4:12 ` Zhu Yanjun
2025-06-21 8:43 ` Arnd Bergmann
2025-06-21 16:09 ` Zhu Yanjun
2025-06-22 13:29 ` Bernard Metzler
2025-06-24 10:47 ` Bernard Metzler
2025-06-25 11:14 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).