netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Feng Yang <yangfeng59949@163.com>, stfomichev@gmail.com
Cc: aleksander.lobakin@intel.com, almasrymina@google.com,
	asml.silence@gmail.com, davem@davemloft.net, ebiggers@google.com,
	edumazet@google.com, horms@kernel.org, kerneljasonxing@gmail.com,
	kuba@kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, willemb@google.com, yangfeng@kylinos.cn
Subject: Re: [PATCH] skbuff: Improve the sending efficiency of __skb_send_sock
Date: Thu, 26 Jun 2025 10:31:09 +0200	[thread overview]
Message-ID: <a21e5d42-5718-4633-b812-be47ec6acf65@redhat.com> (raw)
In-Reply-To: <20250626075020.95425-1-yangfeng59949@163.com>

On 6/26/25 9:50 AM, Feng Yang wrote:
> On Wed, 25 Jun 2025 11:35:55 -0700, Stanislav Fomichev <stfomichev@gmail.com> wrote:
>> On 06/23, Feng Yang wrote:
>>> From: Feng Yang <yangfeng@kylinos.cn>
>>>
>>> By aggregating skb data into a bvec array for transmission, when using sockmap to forward large packets,
>>> what previously required multiple transmissions now only needs a single transmission, which significantly enhances performance.
>>> For small packets, the performance remains comparable to the original level.
>>>
>>> When using sockmap for forwarding, the average latency for different packet sizes
>>> after sending 10,000 packets is as follows:
>>> size	old(us)		new(us)
>>> 512	56		55
>>> 1472	58		58
>>> 1600	106		79
>>> 3000	145		108
>>> 5000	182		123
>>>
>>> Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
>>> ---
>>>  net/core/skbuff.c | 112 +++++++++++++++++++++-------------------------
>>>  1 file changed, 52 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 85fc82f72d26..664443fc9baf 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3235,84 +3235,75 @@ typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg);
>>>  static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
>>>  			   int len, sendmsg_func sendmsg, int flags)
>>>  {
>>> -	unsigned int orig_len = len;
>>>  	struct sk_buff *head = skb;
>>>  	unsigned short fragidx;
>>> -	int slen, ret;
>>> +	struct msghdr msg;
>>> +	struct bio_vec *bvec;
>>> +	int max_vecs, ret, slen;
>>> +	int bvec_count = 0;
>>> +	unsigned int copied = 0;
>>>  
>>> -do_frag_list:
>>> -
>>> -	/* Deal with head data */
>>> -	while (offset < skb_headlen(skb) && len) {
>>> -		struct kvec kv;
>>> -		struct msghdr msg;
>>> -
>>> -		slen = min_t(int, len, skb_headlen(skb) - offset);
>>> -		kv.iov_base = skb->data + offset;
>>> -		kv.iov_len = slen;
>>> -		memset(&msg, 0, sizeof(msg));
>>> -		msg.msg_flags = MSG_DONTWAIT | flags;
>>> -
>>> -		iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
>>> -		ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
>>> -				      sendmsg_unlocked, sk, &msg);
>>> -		if (ret <= 0)
>>> -			goto error;
>>> +	max_vecs = skb_shinfo(skb)->nr_frags + 1; // +1 for linear data
>>> +	if (skb_has_frag_list(skb)) {
>>> +		struct sk_buff *frag_skb = skb_shinfo(skb)->frag_list;
>>>  
>>> -		offset += ret;
>>> -		len -= ret;
>>> +		while (frag_skb) {
>>> +			max_vecs += skb_shinfo(frag_skb)->nr_frags + 1; // +1 for linear data
>>> +			frag_skb = frag_skb->next;
>>> +		}
>>>  	}
>>>  
>>> -	/* All the data was skb head? */
>>> -	if (!len)
>>> -		goto out;
>>> +	bvec = kcalloc(max_vecs, sizeof(struct bio_vec), GFP_KERNEL);
>>> +	if (!bvec)
>>> +		return -ENOMEM;
>>
>> Not sure allocating memory here is a good idea. From what I can tell
>> this function is used by non-sockmap callers as well..

Adding a per packet allocation and a free is IMHO a no-go for a patch
intended to improve performances.

> Alternatively, we can use struct bio_vec bvec[size] to avoid memory allocation.

If you mean using a fixed size bio vec allocated on the stack, that
could work...

> Even if the "size" is insufficient, the unsent portion will be transmitted in the next call to `__skb_send_sock`.

... but I think this part is not acceptable, the callers may/should
already assume that partial transmissions are due to errors.

Instead I think you should loop, batching bio_vec_size tx each loop.

Side note: the patch has a few style issues:
- it should not use // for comments
- variable declaration should respect the reverse christmas tree order

and possibly you could use this refactoring to avoid the use backward
goto statement.

Thanks,

Paolo


      reply	other threads:[~2025-06-26  8:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-23  8:42 [PATCH] skbuff: Improve the sending efficiency of __skb_send_sock Feng Yang
2025-06-25 18:35 ` Stanislav Fomichev
2025-06-26  7:50   ` Feng Yang
2025-06-26  8:31     ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a21e5d42-5718-4633-b812-be47ec6acf65@redhat.com \
    --to=pabeni@redhat.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=almasrymina@google.com \
    --cc=asml.silence@gmail.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stfomichev@gmail.com \
    --cc=willemb@google.com \
    --cc=yangfeng59949@163.com \
    --cc=yangfeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).