* [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers
@ 2026-03-12 8:21 D. Wythe
2026-03-17 2:45 ` [net-next] " Jakub Kicinski
0 siblings, 1 reply; 2+ messages in thread
From: D. Wythe @ 2026-03-12 8:21 UTC (permalink / raw)
To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Sidraya Jayagond, Wenjia Zhang
Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel,
linux-rdma, linux-s390, netdev, oliver.yang, pasic
The alloc_page() cannot satisfy requests exceeding MAX_PAGE_ORDER,
and attempting such allocations will lead to guaranteed failures
and potential kernel warnings.
For SMCR_PHYS_CONT_BUFS, cap the allocation order to MAX_PAGE_ORDER.
This ensures the attempts to allocate the largest possible physically
contiguous chunk succeed, instead of failing with an invalid order.
This also avoids redundant "try-fail-degrade" cycles in
__smc_buf_create().
For SMCR_MIXED_BUFS, if its order exceeds MAX_PAGE_ORDER, skip the
doomed physical allocation attempt and fallback to virtual memory
immediately.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
net/smc/smc_core.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index e2d083daeb7e..a18730edb7e0 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -2314,6 +2314,10 @@ int smcr_buf_reg_lgr(struct smc_link *lnk)
return rc;
}
+/*
+ * smcr_new_buf_create may allocate a buffer smaller than the requested
+ * bufsize. Use buf_desc->len to determine the actual allocated size.
+ */
static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
int bufsize)
{
@@ -2326,18 +2330,22 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
switch (lgr->buf_type) {
case SMCR_PHYS_CONT_BUFS:
+ bufsize = min(bufsize, (int)PAGE_SIZE << MAX_PAGE_ORDER);
+ fallthrough;
case SMCR_MIXED_BUFS:
buf_desc->order = get_order(bufsize);
- buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
- __GFP_NOMEMALLOC | __GFP_COMP |
- __GFP_NORETRY | __GFP_ZERO,
- buf_desc->order);
- if (buf_desc->pages) {
- buf_desc->cpu_addr =
- (void *)page_address(buf_desc->pages);
- buf_desc->len = bufsize;
- buf_desc->is_vm = false;
- break;
+ if (buf_desc->order <= MAX_PAGE_ORDER) {
+ buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | __GFP_COMP |
+ __GFP_NORETRY | __GFP_ZERO,
+ buf_desc->order);
+ if (buf_desc->pages) {
+ buf_desc->cpu_addr =
+ (void *)page_address(buf_desc->pages);
+ buf_desc->len = bufsize;
+ buf_desc->is_vm = false;
+ break;
+ }
}
if (lgr->buf_type == SMCR_PHYS_CONT_BUFS)
goto out;
@@ -2476,7 +2484,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
}
SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb);
- SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, bufsize);
+ SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, buf_desc->len);
buf_desc->used = 1;
down_write(lock);
smc_lgr_buf_list_add(lgr, is_rmb, buf_list, buf_desc);
--
2.45.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers
2026-03-12 8:21 [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers D. Wythe
@ 2026-03-17 2:45 ` Jakub Kicinski
0 siblings, 0 replies; 2+ messages in thread
From: Jakub Kicinski @ 2026-03-17 2:45 UTC (permalink / raw)
To: alibuda
Cc: Jakub Kicinski, netdev, edumazet, tonylu, wenjia, pabeni, guwen,
davem, linux-kernel, mjambigi, dust.li, oliver.yang, sidraya,
linux-s390, horms, pasic, linux-rdma
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/smc: cap allocation order for SMC-R physically contiguous buffers
The alloc_page() cannot satisfy requests exceeding MAX_PAGE_ORDER, and
attempting such allocations will lead to guaranteed failures and potential
kernel warnings. For SMCR_PHYS_CONT_BUFS, cap the allocation order to
MAX_PAGE_ORDER. This ensures the attempts to allocate the largest possible
physically contiguous chunk succeed. For SMCR_MIXED_BUFS, if its order
exceeds MAX_PAGE_ORDER, skip the physical allocation attempt and fallback to
virtual memory immediately.
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index e2d083daeb7e..a18730edb7e0 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -2314,6 +2314,10 @@ int smcr_buf_reg_lgr(struct smc_link *lnk)
> return rc;
> }
>
> +/*
> + * smcr_new_buf_create may allocate a buffer smaller than the requested
> + * bufsize. Use buf_desc->len to determine the actual allocated size.
> + */
> static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
> int bufsize)
> {
> @@ -2326,18 +2330,22 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
>
> switch (lgr->buf_type) {
> case SMCR_PHYS_CONT_BUFS:
> + bufsize = min(bufsize, (int)PAGE_SIZE << MAX_PAGE_ORDER);
> + fallthrough;
> case SMCR_MIXED_BUFS:
> buf_desc->order = get_order(bufsize);
> - buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
> - __GFP_NOMEMALLOC | __GFP_COMP |
> - __GFP_NORETRY | __GFP_ZERO,
> - buf_desc->order);
> - if (buf_desc->pages) {
> - buf_desc->cpu_addr =
> - (void *)page_address(buf_desc->pages);
> - buf_desc->len = bufsize;
> - buf_desc->is_vm = false;
> - break;
> + if (buf_desc->order <= MAX_PAGE_ORDER) {
> + buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
> + __GFP_NOMEMALLOC | __GFP_COMP |
> + __GFP_NORETRY | __GFP_ZERO,
> + buf_desc->order);
> + if (buf_desc->pages) {
> + buf_desc->cpu_addr =
> + (void *)page_address(buf_desc->pages);
> + buf_desc->len = bufsize;
> + buf_desc->is_vm = false;
> + break;
> + }
> }
> if (lgr->buf_type == SMCR_PHYS_CONT_BUFS)
> goto out;
[ ... ]
> @@ -2476,7 +2484,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
> }
>
> SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb);
> - SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, bufsize);
> + SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, buf_desc->len);
> buf_desc->used = 1;
> down_write(lock);
> smc_lgr_buf_list_add(lgr, is_rmb, buf_list, buf_desc);
> up_write(lock);
> break; /* found */
> }
>
> if (IS_ERR(buf_desc))
> return PTR_ERR(buf_desc);
>
> if (!is_smcd) {
> if (smcr_buf_map_usable_links(lgr, buf_desc, is_rmb)) {
> smcr_buf_unuse(buf_desc, is_rmb, lgr);
> return -ENOMEM;
> }
> }
>
> if (is_rmb) {
> conn->rmb_desc = buf_desc;
> conn->rmbe_size_comp = bufsize_comp;
> smc->sk.sk_rcvbuf = bufsize * 2;
> atomic_set(&conn->bytes_to_rcv, 0);
> conn->rmbe_update_limit =
> smc_rmb_wnd_update_limit(buf_desc->len);
> if (is_smcd)
> smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */
> } else {
> conn->sndbuf_desc = buf_desc;
> smc->sk.sk_sndbuf = bufsize * 2;
> atomic_set(&conn->sndbuf_space, bufsize);
^^^^^^^
The comment added at smcr_new_buf_create() says "Use buf_desc->len to
determine the actual allocated size," but __smc_buf_create() still uses the
uncapped bufsize for several critical fields. Is this correct?
For SMCR_PHYS_CONT_BUFS, smcr_new_buf_create() now caps bufsize internally
to PAGE_SIZE << MAX_PAGE_ORDER (typically 4MB), so buf_desc->len can be
smaller than the requested bufsize.
However, __smc_buf_create() continues using the original uncapped bufsize
for:
if (is_rmb) {
conn->rmbe_size_comp = bufsize_comp;
smc->sk.sk_rcvbuf = bufsize * 2;
} else {
smc->sk.sk_sndbuf = bufsize * 2;
atomic_set(&conn->sndbuf_space, bufsize);
}
The sndbuf_space mismatch is particularly concerning. Looking at
smc_tx.c:274, the comment states: "0 <= sndbuf_space <= sndbuf_desc->len".
When sndbuf_space exceeds sndbuf_desc->len, smc_tx_sendmsg() could calculate
writespace incorrectly:
int writespace = atomic_read(&conn->sndbuf_space);
and the two-chunk copy loop might write beyond the actual buffer boundary.
This is reachable with default sysctl settings (SMCR_PHYS_CONT_BUFS) and
socket buffer sizes exceeding 4MB.
Should these assignments use buf_desc->len instead of bufsize? For example:
atomic_set(&conn->sndbuf_space, buf_desc->len);
smc->sk.sk_sndbuf = buf_desc->len * 2;
smc->sk.sk_rcvbuf = buf_desc->len * 2;
And should rmbe_size_comp be recalculated from buf_desc->len? The
rmbe_size_comp value is sent to the peer via CLC messages in smc_clc.c:1125,
and an inflated value might cause the peer to believe the RMB is larger than
it actually is.
> }
> return 0;
> }
--
pw-bot: cr
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-03-17 2:45 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 8:21 [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers D. Wythe
2026-03-17 2:45 ` [net-next] " Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox