From: "D. Wythe" <alibuda@linux.alibaba.com >
To: Jakub Kicinski <kuba@kernel.org>
Cc: alibuda@linux.alibaba.com, netdev@vger.kernel.org,
edumazet@google.com, tonylu@linux.alibaba.com,
wenjia@linux.ibm.com, pabeni@redhat.com, guwen@linux.alibaba.com,
davem@davemloft.net, linux-kernel@vger.kernel.org,
mjambigi@linux.ibm.com, dust.li@linux.alibaba.com,
oliver.yang@linux.alibaba.com, sidraya@linux.ibm.com,
linux-s390@vger.kernel.org, horms@kernel.org,
pasic@linux.ibm.com, linux-rdma@vger.kernel.org
Subject: Re: [net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers
Date: Fri, 3 Apr 2026 14:12:46 +0800 [thread overview]
Message-ID: <20260403061246.GA61656@j66a10360.sqa.eu95> (raw)
In-Reply-To: <20260317024534.588136-1-kuba@kernel.org>
On Mon, Mar 16, 2026 at 07:45:34PM -0700, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
>
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net/smc: cap allocation order for SMC-R physically contiguous buffers
>
> The alloc_page() cannot satisfy requests exceeding MAX_PAGE_ORDER, and
> attempting such allocations will lead to guaranteed failures and potential
> kernel warnings. For SMCR_PHYS_CONT_BUFS, cap the allocation order to
> MAX_PAGE_ORDER. This ensures the attempts to allocate the largest possible
> physically contiguous chunk succeed. For SMCR_MIXED_BUFS, if its order
> exceeds MAX_PAGE_ORDER, skip the physical allocation attempt and fallback to
> virtual memory immediately.
>
> > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> > index e2d083daeb7e..a18730edb7e0 100644
> > --- a/net/smc/smc_core.c
> > +++ b/net/smc/smc_core.c
> > @@ -2314,6 +2314,10 @@ int smcr_buf_reg_lgr(struct smc_link *lnk)
> > return rc;
> > }
> >
> > +/*
> > + * smcr_new_buf_create may allocate a buffer smaller than the requested
> > + * bufsize. Use buf_desc->len to determine the actual allocated size.
> > + */
> > static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
> > int bufsize)
> > {
> > @@ -2326,18 +2330,22 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
> >
> > switch (lgr->buf_type) {
> > case SMCR_PHYS_CONT_BUFS:
> > + bufsize = min(bufsize, (int)PAGE_SIZE << MAX_PAGE_ORDER);
> > + fallthrough;
> > case SMCR_MIXED_BUFS:
> > buf_desc->order = get_order(bufsize);
> > - buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
> > - __GFP_NOMEMALLOC | __GFP_COMP |
> > - __GFP_NORETRY | __GFP_ZERO,
> > - buf_desc->order);
> > - if (buf_desc->pages) {
> > - buf_desc->cpu_addr =
> > - (void *)page_address(buf_desc->pages);
> > - buf_desc->len = bufsize;
> > - buf_desc->is_vm = false;
> > - break;
> > + if (buf_desc->order <= MAX_PAGE_ORDER) {
> > + buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
> > + __GFP_NOMEMALLOC | __GFP_COMP |
> > + __GFP_NORETRY | __GFP_ZERO,
> > + buf_desc->order);
> > + if (buf_desc->pages) {
> > + buf_desc->cpu_addr =
> > + (void *)page_address(buf_desc->pages);
> > + buf_desc->len = bufsize;
> > + buf_desc->is_vm = false;
> > + break;
> > + }
> > }
> > if (lgr->buf_type == SMCR_PHYS_CONT_BUFS)
> > goto out;
>
> [ ... ]
>
> > @@ -2476,7 +2484,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
> > }
> >
> > SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb);
> > - SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, bufsize);
> > + SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, buf_desc->len);
> > buf_desc->used = 1;
> > down_write(lock);
> > smc_lgr_buf_list_add(lgr, is_rmb, buf_list, buf_desc);
> > up_write(lock);
> > break; /* found */
> > }
> >
> > if (IS_ERR(buf_desc))
> > return PTR_ERR(buf_desc);
> >
> > if (!is_smcd) {
> > if (smcr_buf_map_usable_links(lgr, buf_desc, is_rmb)) {
> > smcr_buf_unuse(buf_desc, is_rmb, lgr);
> > return -ENOMEM;
> > }
> > }
> >
> > if (is_rmb) {
> > conn->rmb_desc = buf_desc;
> > conn->rmbe_size_comp = bufsize_comp;
> > smc->sk.sk_rcvbuf = bufsize * 2;
> > atomic_set(&conn->bytes_to_rcv, 0);
> > conn->rmbe_update_limit =
> > smc_rmb_wnd_update_limit(buf_desc->len);
> > if (is_smcd)
> > smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */
> > } else {
> > conn->sndbuf_desc = buf_desc;
> > smc->sk.sk_sndbuf = bufsize * 2;
> > atomic_set(&conn->sndbuf_space, bufsize);
> ^^^^^^^
>
> The comment added at smcr_new_buf_create() says "Use buf_desc->len to
> determine the actual allocated size," but __smc_buf_create() still uses the
> uncapped bufsize for several critical fields. Is this correct?
>
> For SMCR_PHYS_CONT_BUFS, smcr_new_buf_create() now caps bufsize internally
> to PAGE_SIZE << MAX_PAGE_ORDER (typically 4MB), so buf_desc->len can be
> smaller than the requested bufsize.
>
> However, __smc_buf_create() continues using the original uncapped bufsize
> for:
>
> if (is_rmb) {
> conn->rmbe_size_comp = bufsize_comp;
> smc->sk.sk_rcvbuf = bufsize * 2;
> } else {
> smc->sk.sk_sndbuf = bufsize * 2;
> atomic_set(&conn->sndbuf_space, bufsize);
> }
>
> The sndbuf_space mismatch is particularly concerning. Looking at
> smc_tx.c:274, the comment states: "0 <= sndbuf_space <= sndbuf_desc->len".
> When sndbuf_space exceeds sndbuf_desc->len, smc_tx_sendmsg() could calculate
> writespace incorrectly:
>
> int writespace = atomic_read(&conn->sndbuf_space);
>
> and the two-chunk copy loop might write beyond the actual buffer boundary.
> This is reachable with default sysctl settings (SMCR_PHYS_CONT_BUFS) and
> socket buffer sizes exceeding 4MB.
>
> Should these assignments use buf_desc->len instead of bufsize? For example:
>
> atomic_set(&conn->sndbuf_space, buf_desc->len);
> smc->sk.sk_sndbuf = buf_desc->len * 2;
> smc->sk.sk_rcvbuf = buf_desc->len * 2;
>
> And should rmbe_size_comp be recalculated from buf_desc->len? The
> rmbe_size_comp value is sent to the peer via CLC messages in smc_clc.c:1125,
> and an inflated value might cause the peer to believe the RMB is larger than
> it actually is.
>
Thanks for catching this.
After rethinking this, I think we should not change the semantic of
smcr_new_buf_create() to return a potentially smaller buffer than
requested. The MAX_PAGE_ORDER limit is better handled in __smc_buf_create(),
where the effective buffer size is already known and used to initialize
socket/accounting state.
I'll update the patch accordingly.
> > }
> > return 0;
> > }
> --
> pw-bot: cr
prev parent reply other threads:[~2026-04-03 6:12 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 8:21 [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers D. Wythe
2026-03-17 2:45 ` [net-next] " Jakub Kicinski
2026-04-03 6:12 ` D. Wythe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260403061246.GA61656@j66a10360.sqa.eu95 \
--to=alibuda@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=dust.li@linux.alibaba.com \
--cc=edumazet@google.com \
--cc=guwen@linux.alibaba.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjambigi@linux.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=oliver.yang@linux.alibaba.com \
--cc=pabeni@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=sidraya@linux.ibm.com \
--cc=tonylu@linux.alibaba.com \
--cc=wenjia@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox