* [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers @ 2026-03-12 8:21 D. Wythe 2026-03-17 2:45 ` [net-next] " Jakub Kicinski 0 siblings, 1 reply; 3+ messages in thread From: D. Wythe @ 2026-03-12 8:21 UTC (permalink / raw) To: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Sidraya Jayagond, Wenjia Zhang Cc: Mahanta Jambigi, Simon Horman, Tony Lu, Wen Gu, linux-kernel, linux-rdma, linux-s390, netdev, oliver.yang, pasic The alloc_page() cannot satisfy requests exceeding MAX_PAGE_ORDER, and attempting such allocations will lead to guaranteed failures and potential kernel warnings. For SMCR_PHYS_CONT_BUFS, cap the allocation order to MAX_PAGE_ORDER. This ensures the attempts to allocate the largest possible physically contiguous chunk succeed, instead of failing with an invalid order. This also avoids redundant "try-fail-degrade" cycles in __smc_buf_create(). For SMCR_MIXED_BUFS, if its order exceeds MAX_PAGE_ORDER, skip the doomed physical allocation attempt and fallback to virtual memory immediately. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> --- net/smc/smc_core.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index e2d083daeb7e..a18730edb7e0 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -2314,6 +2314,10 @@ int smcr_buf_reg_lgr(struct smc_link *lnk) return rc; } +/* + * smcr_new_buf_create may allocate a buffer smaller than the requested + * bufsize. Use buf_desc->len to determine the actual allocated size. + */ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr, int bufsize) { @@ -2326,18 +2330,22 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr, switch (lgr->buf_type) { case SMCR_PHYS_CONT_BUFS: + bufsize = min(bufsize, (int)PAGE_SIZE << MAX_PAGE_ORDER); + fallthrough; case SMCR_MIXED_BUFS: buf_desc->order = get_order(bufsize); - buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN | - __GFP_NOMEMALLOC | __GFP_COMP | - __GFP_NORETRY | __GFP_ZERO, - buf_desc->order); - if (buf_desc->pages) { - buf_desc->cpu_addr = - (void *)page_address(buf_desc->pages); - buf_desc->len = bufsize; - buf_desc->is_vm = false; - break; + if (buf_desc->order <= MAX_PAGE_ORDER) { + buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN | + __GFP_NOMEMALLOC | __GFP_COMP | + __GFP_NORETRY | __GFP_ZERO, + buf_desc->order); + if (buf_desc->pages) { + buf_desc->cpu_addr = + (void *)page_address(buf_desc->pages); + buf_desc->len = bufsize; + buf_desc->is_vm = false; + break; + } } if (lgr->buf_type == SMCR_PHYS_CONT_BUFS) goto out; @@ -2476,7 +2484,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) } SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb); - SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, bufsize); + SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, buf_desc->len); buf_desc->used = 1; down_write(lock); smc_lgr_buf_list_add(lgr, is_rmb, buf_list, buf_desc); -- 2.45.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers 2026-03-12 8:21 [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers D. Wythe @ 2026-03-17 2:45 ` Jakub Kicinski 2026-04-03 6:12 ` D. Wythe 0 siblings, 1 reply; 3+ messages in thread From: Jakub Kicinski @ 2026-03-17 2:45 UTC (permalink / raw) To: alibuda Cc: Jakub Kicinski, netdev, edumazet, tonylu, wenjia, pabeni, guwen, davem, linux-kernel, mjambigi, dust.li, oliver.yang, sidraya, linux-s390, horms, pasic, linux-rdma This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html --- net/smc: cap allocation order for SMC-R physically contiguous buffers The alloc_page() cannot satisfy requests exceeding MAX_PAGE_ORDER, and attempting such allocations will lead to guaranteed failures and potential kernel warnings. For SMCR_PHYS_CONT_BUFS, cap the allocation order to MAX_PAGE_ORDER. This ensures the attempts to allocate the largest possible physically contiguous chunk succeed. For SMCR_MIXED_BUFS, if its order exceeds MAX_PAGE_ORDER, skip the physical allocation attempt and fallback to virtual memory immediately. > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c > index e2d083daeb7e..a18730edb7e0 100644 > --- a/net/smc/smc_core.c > +++ b/net/smc/smc_core.c > @@ -2314,6 +2314,10 @@ int smcr_buf_reg_lgr(struct smc_link *lnk) > return rc; > } > > +/* > + * smcr_new_buf_create may allocate a buffer smaller than the requested > + * bufsize. Use buf_desc->len to determine the actual allocated size. > + */ > static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr, > int bufsize) > { > @@ -2326,18 +2330,22 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr, > > switch (lgr->buf_type) { > case SMCR_PHYS_CONT_BUFS: > + bufsize = min(bufsize, (int)PAGE_SIZE << MAX_PAGE_ORDER); > + fallthrough; > case SMCR_MIXED_BUFS: > buf_desc->order = get_order(bufsize); > - buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN | > - __GFP_NOMEMALLOC | __GFP_COMP | > - __GFP_NORETRY | __GFP_ZERO, > - buf_desc->order); > - if (buf_desc->pages) { > - buf_desc->cpu_addr = > - (void *)page_address(buf_desc->pages); > - buf_desc->len = bufsize; > - buf_desc->is_vm = false; > - break; > + if (buf_desc->order <= MAX_PAGE_ORDER) { > + buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN | > + __GFP_NOMEMALLOC | __GFP_COMP | > + __GFP_NORETRY | __GFP_ZERO, > + buf_desc->order); > + if (buf_desc->pages) { > + buf_desc->cpu_addr = > + (void *)page_address(buf_desc->pages); > + buf_desc->len = bufsize; > + buf_desc->is_vm = false; > + break; > + } > } > if (lgr->buf_type == SMCR_PHYS_CONT_BUFS) > goto out; [ ... ] > @@ -2476,7 +2484,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) > } > > SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb); > - SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, bufsize); > + SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, buf_desc->len); > buf_desc->used = 1; > down_write(lock); > smc_lgr_buf_list_add(lgr, is_rmb, buf_list, buf_desc); > up_write(lock); > break; /* found */ > } > > if (IS_ERR(buf_desc)) > return PTR_ERR(buf_desc); > > if (!is_smcd) { > if (smcr_buf_map_usable_links(lgr, buf_desc, is_rmb)) { > smcr_buf_unuse(buf_desc, is_rmb, lgr); > return -ENOMEM; > } > } > > if (is_rmb) { > conn->rmb_desc = buf_desc; > conn->rmbe_size_comp = bufsize_comp; > smc->sk.sk_rcvbuf = bufsize * 2; > atomic_set(&conn->bytes_to_rcv, 0); > conn->rmbe_update_limit = > smc_rmb_wnd_update_limit(buf_desc->len); > if (is_smcd) > smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */ > } else { > conn->sndbuf_desc = buf_desc; > smc->sk.sk_sndbuf = bufsize * 2; > atomic_set(&conn->sndbuf_space, bufsize); ^^^^^^^ The comment added at smcr_new_buf_create() says "Use buf_desc->len to determine the actual allocated size," but __smc_buf_create() still uses the uncapped bufsize for several critical fields. Is this correct? For SMCR_PHYS_CONT_BUFS, smcr_new_buf_create() now caps bufsize internally to PAGE_SIZE << MAX_PAGE_ORDER (typically 4MB), so buf_desc->len can be smaller than the requested bufsize. However, __smc_buf_create() continues using the original uncapped bufsize for: if (is_rmb) { conn->rmbe_size_comp = bufsize_comp; smc->sk.sk_rcvbuf = bufsize * 2; } else { smc->sk.sk_sndbuf = bufsize * 2; atomic_set(&conn->sndbuf_space, bufsize); } The sndbuf_space mismatch is particularly concerning. Looking at smc_tx.c:274, the comment states: "0 <= sndbuf_space <= sndbuf_desc->len". When sndbuf_space exceeds sndbuf_desc->len, smc_tx_sendmsg() could calculate writespace incorrectly: int writespace = atomic_read(&conn->sndbuf_space); and the two-chunk copy loop might write beyond the actual buffer boundary. This is reachable with default sysctl settings (SMCR_PHYS_CONT_BUFS) and socket buffer sizes exceeding 4MB. Should these assignments use buf_desc->len instead of bufsize? For example: atomic_set(&conn->sndbuf_space, buf_desc->len); smc->sk.sk_sndbuf = buf_desc->len * 2; smc->sk.sk_rcvbuf = buf_desc->len * 2; And should rmbe_size_comp be recalculated from buf_desc->len? The rmbe_size_comp value is sent to the peer via CLC messages in smc_clc.c:1125, and an inflated value might cause the peer to believe the RMB is larger than it actually is. > } > return 0; > } -- pw-bot: cr ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers 2026-03-17 2:45 ` [net-next] " Jakub Kicinski @ 2026-04-03 6:12 ` D. Wythe 0 siblings, 0 replies; 3+ messages in thread From: D. Wythe @ 2026-04-03 6:12 UTC (permalink / raw) To: Jakub Kicinski Cc: alibuda, netdev, edumazet, tonylu, wenjia, pabeni, guwen, davem, linux-kernel, mjambigi, dust.li, oliver.yang, sidraya, linux-s390, horms, pasic, linux-rdma On Mon, Mar 16, 2026 at 07:45:34PM -0700, Jakub Kicinski wrote: > This is an AI-generated review of your patch. The human sending this > email has considered the AI review valid, or at least plausible. > > For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html > --- > net/smc: cap allocation order for SMC-R physically contiguous buffers > > The alloc_page() cannot satisfy requests exceeding MAX_PAGE_ORDER, and > attempting such allocations will lead to guaranteed failures and potential > kernel warnings. For SMCR_PHYS_CONT_BUFS, cap the allocation order to > MAX_PAGE_ORDER. This ensures the attempts to allocate the largest possible > physically contiguous chunk succeed. For SMCR_MIXED_BUFS, if its order > exceeds MAX_PAGE_ORDER, skip the physical allocation attempt and fallback to > virtual memory immediately. > > > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c > > index e2d083daeb7e..a18730edb7e0 100644 > > --- a/net/smc/smc_core.c > > +++ b/net/smc/smc_core.c > > @@ -2314,6 +2314,10 @@ int smcr_buf_reg_lgr(struct smc_link *lnk) > > return rc; > > } > > > > +/* > > + * smcr_new_buf_create may allocate a buffer smaller than the requested > > + * bufsize. Use buf_desc->len to determine the actual allocated size. > > + */ > > static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr, > > int bufsize) > > { > > @@ -2326,18 +2330,22 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr, > > > > switch (lgr->buf_type) { > > case SMCR_PHYS_CONT_BUFS: > > + bufsize = min(bufsize, (int)PAGE_SIZE << MAX_PAGE_ORDER); > > + fallthrough; > > case SMCR_MIXED_BUFS: > > buf_desc->order = get_order(bufsize); > > - buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN | > > - __GFP_NOMEMALLOC | __GFP_COMP | > > - __GFP_NORETRY | __GFP_ZERO, > > - buf_desc->order); > > - if (buf_desc->pages) { > > - buf_desc->cpu_addr = > > - (void *)page_address(buf_desc->pages); > > - buf_desc->len = bufsize; > > - buf_desc->is_vm = false; > > - break; > > + if (buf_desc->order <= MAX_PAGE_ORDER) { > > + buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN | > > + __GFP_NOMEMALLOC | __GFP_COMP | > > + __GFP_NORETRY | __GFP_ZERO, > > + buf_desc->order); > > + if (buf_desc->pages) { > > + buf_desc->cpu_addr = > > + (void *)page_address(buf_desc->pages); > > + buf_desc->len = bufsize; > > + buf_desc->is_vm = false; > > + break; > > + } > > } > > if (lgr->buf_type == SMCR_PHYS_CONT_BUFS) > > goto out; > > [ ... ] > > > @@ -2476,7 +2484,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) > > } > > > > SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb); > > - SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, bufsize); > > + SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, buf_desc->len); > > buf_desc->used = 1; > > down_write(lock); > > smc_lgr_buf_list_add(lgr, is_rmb, buf_list, buf_desc); > > up_write(lock); > > break; /* found */ > > } > > > > if (IS_ERR(buf_desc)) > > return PTR_ERR(buf_desc); > > > > if (!is_smcd) { > > if (smcr_buf_map_usable_links(lgr, buf_desc, is_rmb)) { > > smcr_buf_unuse(buf_desc, is_rmb, lgr); > > return -ENOMEM; > > } > > } > > > > if (is_rmb) { > > conn->rmb_desc = buf_desc; > > conn->rmbe_size_comp = bufsize_comp; > > smc->sk.sk_rcvbuf = bufsize * 2; > > atomic_set(&conn->bytes_to_rcv, 0); > > conn->rmbe_update_limit = > > smc_rmb_wnd_update_limit(buf_desc->len); > > if (is_smcd) > > smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */ > > } else { > > conn->sndbuf_desc = buf_desc; > > smc->sk.sk_sndbuf = bufsize * 2; > > atomic_set(&conn->sndbuf_space, bufsize); > ^^^^^^^ > > The comment added at smcr_new_buf_create() says "Use buf_desc->len to > determine the actual allocated size," but __smc_buf_create() still uses the > uncapped bufsize for several critical fields. Is this correct? > > For SMCR_PHYS_CONT_BUFS, smcr_new_buf_create() now caps bufsize internally > to PAGE_SIZE << MAX_PAGE_ORDER (typically 4MB), so buf_desc->len can be > smaller than the requested bufsize. > > However, __smc_buf_create() continues using the original uncapped bufsize > for: > > if (is_rmb) { > conn->rmbe_size_comp = bufsize_comp; > smc->sk.sk_rcvbuf = bufsize * 2; > } else { > smc->sk.sk_sndbuf = bufsize * 2; > atomic_set(&conn->sndbuf_space, bufsize); > } > > The sndbuf_space mismatch is particularly concerning. Looking at > smc_tx.c:274, the comment states: "0 <= sndbuf_space <= sndbuf_desc->len". > When sndbuf_space exceeds sndbuf_desc->len, smc_tx_sendmsg() could calculate > writespace incorrectly: > > int writespace = atomic_read(&conn->sndbuf_space); > > and the two-chunk copy loop might write beyond the actual buffer boundary. > This is reachable with default sysctl settings (SMCR_PHYS_CONT_BUFS) and > socket buffer sizes exceeding 4MB. > > Should these assignments use buf_desc->len instead of bufsize? For example: > > atomic_set(&conn->sndbuf_space, buf_desc->len); > smc->sk.sk_sndbuf = buf_desc->len * 2; > smc->sk.sk_rcvbuf = buf_desc->len * 2; > > And should rmbe_size_comp be recalculated from buf_desc->len? The > rmbe_size_comp value is sent to the peer via CLC messages in smc_clc.c:1125, > and an inflated value might cause the peer to believe the RMB is larger than > it actually is. > Thanks for catching this. After rethinking this, I think we should not change the semantic of smcr_new_buf_create() to return a potentially smaller buffer than requested. The MAX_PAGE_ORDER limit is better handled in __smc_buf_create(), where the effective buffer size is already known and used to initialize socket/accounting state. I'll update the patch accordingly. > > } > > return 0; > > } > -- > pw-bot: cr ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-03 6:12 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-12 8:21 [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers D. Wythe 2026-03-17 2:45 ` [net-next] " Jakub Kicinski 2026-04-03 6:12 ` D. Wythe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox