Re: [PATCH v2] net/packet: support mergeable feautre of virtio

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "沈安琪(凛玥)" <amy.saq@antgroup.com>,
	netdev@vger.kernel.org, davem@davemloft.net, jasowang@redhat.com,
	谈鉴锋 <henry.tjf@antgroup.com>
Subject: Re: [PATCH v2] net/packet: support mergeable feautre of virtio
Date: Mon, 27 Feb 2023 12:46:58 -0500	[thread overview]
Message-ID: <20230227124131-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <63fcdaf7e3e9d_1684422084b@willemb.c.googlers.com.notmuch>

typo in $subj

On Mon, Feb 27, 2023 at 11:31:51AM -0500, Willem de Bruijn wrote:
> 沈安琪(凛玥) wrote:
> > From: Jianfeng Tan <henry.tjf@antgroup.com>
> > 
> > Packet sockets, like tap, can be used as the backend for kernel vhost.
> > In packet sockets, virtio net header size is currently hardcoded to be
> > the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
> > always the case: some virtio features, such as mrg_rxbuf, need virtio
> > net header to be 12-byte long.
> > 
> > Mergeable buffers, as a virtio feature, is worthy to support: packets

worthy of

> > that larger than one-mbuf size will be dropped in vhost worker's

are larger

> > handle_rx if mrg_rxbuf feature is not used, but large packets
> > cannot be avoided and increasing mbuf's size is not economical.
> > 
> > With this virtio feature enabled, packet sockets with hardcoded 10-byte

you mean with this feature enabled in guest but without support in tap


> > virtio net header will parse mac head incorrectly in packet_snd by taking
> > the last two bytes of virtio net header as part of mac header as well.

as well as what?

> > This incorrect mac header parsing will cause packet be dropped due to

to be dropped

> > invalid ether head checking in later under-layer device packet receiving.
> > 
> > By adding extra field vnet_hdr_sz with utilizing holes in struct
> > packet_sock to record current using virtio net header size and supporting

currently used

> > extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
> > sockets can know the exact length of virtio net header that virtio user
> > gives.
> > In packet_snd, tpacket_snd and packet_recvmsg, instead of using hardcode

hardcoded

> > virtio net header size, it can get the exact vnet_hdr_sz from corresponding
> > packet_sock, and parse mac header correctly based on this information to
> > avoid the packets being mistakenly dropped.
> > 
> > Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com>
> > Co-developed-by: Anqi Shen <amy.saq@antgroup.com>
> > Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
> 
> net-next is closed


> > @@ -2311,7 +2312,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> >  				       (maclen < 16 ? 16 : maclen)) +
> >  				       po->tp_reserve;
> >  		if (po->has_vnet_hdr) {
> > -			netoff += sizeof(struct virtio_net_hdr);
> > +			netoff += po->vnet_hdr_sz;
> >  			do_vnet = true;
> >  		}
> >  		macoff = netoff - maclen;
> > @@ -2552,16 +2553,23 @@ static int __packet_snd_vnet_parse(struct virtio_net_hdr *vnet_hdr, size_t len)
> >  }
> >  
> >  static int packet_snd_vnet_parse(struct msghdr *msg, size_t *len,
> > -				 struct virtio_net_hdr *vnet_hdr)
> > +				 struct virtio_net_hdr *vnet_hdr, int vnet_hdr_sz)
> >  {
> > -	if (*len < sizeof(*vnet_hdr))
> > +	int ret;
> > +
> > +	if (*len < vnet_hdr_sz)
> >  		return -EINVAL;
> > -	*len -= sizeof(*vnet_hdr);
> > +	*len -= vnet_hdr_sz;
> >  
> >  	if (!copy_from_iter_full(vnet_hdr, sizeof(*vnet_hdr), &msg->msg_iter))
> >  		return -EFAULT;
> >  
> > -	return __packet_snd_vnet_parse(vnet_hdr, *len);
> > +	ret = __packet_snd_vnet_parse(vnet_hdr, *len);
> > +
> > +	/* move iter to point to the start of mac header */
> > +	if (ret == 0)
> > +		iov_iter_advance(&msg->msg_iter, vnet_hdr_sz - sizeof(struct virtio_net_hdr));
> > +	return ret;
> 
> Let's make the error path the exception
> 
>         if (ret)
>                 return ret;
> 
> And maybe avoid calling iov_iter_advance if vnet_hdr_sz == sizeof(*vnet_hdr)
> 
> >  	case PACKET_VNET_HDR:
> > +	case PACKET_VNET_HDR_SZ:
> >  	{
> >  		int val;
> > +		int hdr_len = 0;
> >  
> >  		if (sock->type != SOCK_RAW)
> >  			return -EINVAL;
> > @@ -3931,11 +3945,23 @@ static void packet_flush_mclist(struct sock *sk)
> >  		if (copy_from_sockptr(&val, optval, sizeof(val)))
> >  			return -EFAULT;
> >  
> > +		if (optname == PACKET_VNET_HDR_SZ) {
> > +			if (val != sizeof(struct virtio_net_hdr) &&
> > +			    val != sizeof(struct virtio_net_hdr_mrg_rxbuf))
> > +				return -EINVAL;
> > +			hdr_len = val;
> > +		}
> > +
> 
>     } else {
>             hdr_len = sizeof(struct virtio_net_hdr);
>     }
> 
> >  		lock_sock(sk);
> >  		if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) {
> >  			ret = -EBUSY;
> >  		} else {
> > -			po->has_vnet_hdr = !!val;
> > +			if (optname == PACKET_VNET_HDR) {
> > +				po->has_vnet_hdr = !!val;
> > +				if (po->has_vnet_hdr)
> > +					hdr_len = sizeof(struct virtio_net_hdr);
> > +			}
> > +			po->vnet_hdr_sz = hdr_len;
> 
> then this is not needed
> >  			ret = 0;
> >  		}
> >  		release_sock(sk);
> > @@ -4070,6 +4096,9 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
> >  	case PACKET_VNET_HDR:
> >  		val = po->has_vnet_hdr;
> >  		break;
> > +	case PACKET_VNET_HDR_SZ:
> > +		val = po->vnet_hdr_sz;
> > +		break;
> >  	case PACKET_VERSION:
> >  		val = po->tp_version;
> >  		break;
> > diff --git a/net/packet/internal.h b/net/packet/internal.h
> > index 48af35b..e27b47d 100644
> > --- a/net/packet/internal.h
> > +++ b/net/packet/internal.h
> > @@ -121,7 +121,8 @@ struct packet_sock {
> >  				origdev:1,
> >  				has_vnet_hdr:1,
> >  				tp_loss:1,
> > -				tp_tx_has_off:1;
> > +				tp_tx_has_off:1,
> > +				vnet_hdr_sz:8;	/* vnet header size should use */
> 
> has_vnet_hdr is no longer needed when adding vnet_hdr_sz. removing that simplifies the code
> 
> drop the comment. That is quite self explanatory from the variable name.

besides, it's agrammatical :)

> >  	int			pressure;
> >  	int			ifindex;	/* bound device		*/
> >  	__be16			num;
> > -- 
> > 1.8.3.1
> > 
>

next prev parent reply	other threads:[~2023-02-27 17:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-27 11:33 [PATCH v2] net/packet: support mergeable feautre of virtio 沈安琪(凛玥)
2023-02-27 16:31 ` Willem de Bruijn
2023-02-27 17:46   ` Michael S. Tsirkin [this message]
2023-02-28  2:27   ` 沈安琪(凛玥)
2023-02-28  4:31     ` Willem de Bruijn
2023-02-28  4:38       ` 沈安琪(凛玥)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230227124131-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=amy.saq@antgroup.com \
    --cc=davem@davemloft.net \
    --cc=henry.tjf@antgroup.com \
    --cc=jasowang@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).