Re: [net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: alibuda@linux.alibaba.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, edumazet@google.com,
	tonylu@linux.alibaba.com, wenjia@linux.ibm.com,
	pabeni@redhat.com, guwen@linux.alibaba.com, davem@davemloft.net,
	linux-kernel@vger.kernel.org, mjambigi@linux.ibm.com,
	dust.li@linux.alibaba.com, oliver.yang@linux.alibaba.com,
	sidraya@linux.ibm.com, linux-s390@vger.kernel.org,
	horms@kernel.org, pasic@linux.ibm.com,
	linux-rdma@vger.kernel.org
Subject: Re: [net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers
Date: Mon, 16 Mar 2026 19:45:34 -0700	[thread overview]
Message-ID: <20260317024534.588136-1-kuba@kernel.org> (raw)
In-Reply-To: <20260312082154.36971-1-alibuda@linux.alibaba.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/smc: cap allocation order for SMC-R physically contiguous buffers

The alloc_page() cannot satisfy requests exceeding MAX_PAGE_ORDER, and
attempting such allocations will lead to guaranteed failures and potential
kernel warnings. For SMCR_PHYS_CONT_BUFS, cap the allocation order to
MAX_PAGE_ORDER. This ensures the attempts to allocate the largest possible
physically contiguous chunk succeed. For SMCR_MIXED_BUFS, if its order
exceeds MAX_PAGE_ORDER, skip the physical allocation attempt and fallback to
virtual memory immediately.

> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index e2d083daeb7e..a18730edb7e0 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -2314,6 +2314,10 @@ int smcr_buf_reg_lgr(struct smc_link *lnk)
>  	return rc;
>  }
>
> +/*
> + * smcr_new_buf_create may allocate a buffer smaller than the requested
> + * bufsize. Use buf_desc->len to determine the actual allocated size.
> + */
>  static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
>  						int bufsize)
>  {
> @@ -2326,18 +2330,22 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
>
>  	switch (lgr->buf_type) {
>  	case SMCR_PHYS_CONT_BUFS:
> +		bufsize = min(bufsize, (int)PAGE_SIZE << MAX_PAGE_ORDER);
> +		fallthrough;
>  	case SMCR_MIXED_BUFS:
>  		buf_desc->order = get_order(bufsize);
> -		buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
> -					      __GFP_NOMEMALLOC | __GFP_COMP |
> -					      __GFP_NORETRY | __GFP_ZERO,
> -					      buf_desc->order);
> -		if (buf_desc->pages) {
> -			buf_desc->cpu_addr =
> -				(void *)page_address(buf_desc->pages);
> -			buf_desc->len = bufsize;
> -			buf_desc->is_vm = false;
> -			break;
> +		if (buf_desc->order <= MAX_PAGE_ORDER) {
> +			buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
> +						      __GFP_NOMEMALLOC | __GFP_COMP |
> +						      __GFP_NORETRY | __GFP_ZERO,
> +						      buf_desc->order);
> +			if (buf_desc->pages) {
> +				buf_desc->cpu_addr =
> +					(void *)page_address(buf_desc->pages);
> +				buf_desc->len = bufsize;
> +				buf_desc->is_vm = false;
> +				break;
> +			}
>  		}
>  		if (lgr->buf_type == SMCR_PHYS_CONT_BUFS)
>  			goto out;

[ ... ]

> @@ -2476,7 +2484,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
>  		}
>
>  		SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb);
> -		SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, bufsize);
> +		SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, true, buf_desc->len);
>  		buf_desc->used = 1;
>  		down_write(lock);
>  		smc_lgr_buf_list_add(lgr, is_rmb, buf_list, buf_desc);
>  		up_write(lock);
>  		break; /* found */
>  	}
>
>  	if (IS_ERR(buf_desc))
>  		return PTR_ERR(buf_desc);
>
>  	if (!is_smcd) {
>  		if (smcr_buf_map_usable_links(lgr, buf_desc, is_rmb)) {
>  			smcr_buf_unuse(buf_desc, is_rmb, lgr);
>  			return -ENOMEM;
>  		}
>  	}
>
>  	if (is_rmb) {
>  		conn->rmb_desc = buf_desc;
>  		conn->rmbe_size_comp = bufsize_comp;
>  		smc->sk.sk_rcvbuf = bufsize * 2;
>  		atomic_set(&conn->bytes_to_rcv, 0);
>  		conn->rmbe_update_limit =
>  			smc_rmb_wnd_update_limit(buf_desc->len);
>  		if (is_smcd)
>  			smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */
>  	} else {
>  		conn->sndbuf_desc = buf_desc;
>  		smc->sk.sk_sndbuf = bufsize * 2;
>  		atomic_set(&conn->sndbuf_space, bufsize);
                                                 ^^^^^^^

The comment added at smcr_new_buf_create() says "Use buf_desc->len to
determine the actual allocated size," but __smc_buf_create() still uses the
uncapped bufsize for several critical fields. Is this correct?

For SMCR_PHYS_CONT_BUFS, smcr_new_buf_create() now caps bufsize internally
to PAGE_SIZE << MAX_PAGE_ORDER (typically 4MB), so buf_desc->len can be
smaller than the requested bufsize.

However, __smc_buf_create() continues using the original uncapped bufsize
for:

    if (is_rmb) {
        conn->rmbe_size_comp = bufsize_comp;
        smc->sk.sk_rcvbuf = bufsize * 2;
    } else {
        smc->sk.sk_sndbuf = bufsize * 2;
        atomic_set(&conn->sndbuf_space, bufsize);
    }

The sndbuf_space mismatch is particularly concerning. Looking at
smc_tx.c:274, the comment states: "0 <= sndbuf_space <= sndbuf_desc->len".
When sndbuf_space exceeds sndbuf_desc->len, smc_tx_sendmsg() could calculate
writespace incorrectly:

    int writespace = atomic_read(&conn->sndbuf_space);

and the two-chunk copy loop might write beyond the actual buffer boundary.
This is reachable with default sysctl settings (SMCR_PHYS_CONT_BUFS) and
socket buffer sizes exceeding 4MB.

Should these assignments use buf_desc->len instead of bufsize? For example:

    atomic_set(&conn->sndbuf_space, buf_desc->len);
    smc->sk.sk_sndbuf = buf_desc->len * 2;
    smc->sk.sk_rcvbuf = buf_desc->len * 2;

And should rmbe_size_comp be recalculated from buf_desc->len? The
rmbe_size_comp value is sent to the peer via CLC messages in smc_clc.c:1125,
and an inflated value might cause the peer to believe the RMB is larger than
it actually is.

>  	}
>  	return 0;
>  }
-- 
pw-bot: cr

next prev parent reply	other threads:[~2026-03-17  2:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12  8:21 [PATCH net-next] net/smc: cap allocation order for SMC-R physically contiguous buffers D. Wythe
2026-03-17  2:45 ` Jakub Kicinski [this message]
2026-04-03  6:12   ` [net-next] " D. Wythe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260317024534.588136-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=alibuda@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=dust.li@linux.alibaba.com \
    --cc=edumazet@google.com \
    --cc=guwen@linux.alibaba.com \
    --cc=horms@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjambigi@linux.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=oliver.yang@linux.alibaba.com \
    --cc=pabeni@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=sidraya@linux.ibm.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=wenjia@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.