netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
To: Ben Hutchings <bhutchings-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Subject: Re: [PATCH 2/2 net-next] openvswitch: Use skb_zerocopy() for upcall
Date: Sun, 10 Nov 2013 23:40:33 +0000	[thread overview]
Message-ID: <20131110234033.GA18099@casper.infradead.org> (raw)
In-Reply-To: <1384034549.3802.41.camel-/LGg1Z1CJKQ+9kgCwbf1HqK4ta4zdZpAajtMo4Cw6ucAvxtiuMwx3w@public.gmane.org>

On 11/09/13 at 10:02pm, Ben Hutchings wrote:
> On Fri, 2013-11-08 at 10:15 +0100, Thomas Graf wrote:
> > Use of skb_zerocopy() avoids the expensive call to memcpy() when
> > copying the packet data into the Netlink skb. Completes checksum
> > through skb_checksum_help() if needed.
> > 
> > Netlink messaged must be properly padded and aligned to meet
> > sanity checks of the user space counterpart.
> > 
> > Cost of memcpy is significantly reduced from:
> > +   7.48%       vhost-8471  [k] memcpy
> > +   5.57%     ovs-vswitchd  [k] memcpy
> > +   2.81%       vhost-8471  [k] csum_partial_copy_generic
> > 
> > to:
> > +   5.72%     ovs-vswitchd  [k] memcpy
> > +   3.32%       vhost-5153  [k] memcpy
> > +   0.68%       vhost-5153  [k] skb_zerocopy
> > 
> > (megaflows disabled)
> > 
> > Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
> > ---
> >  net/openvswitch/datapath.c | 52 +++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 45 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> > index 1408adc..3f170e3 100644
> > --- a/net/openvswitch/datapath.c
> > +++ b/net/openvswitch/datapath.c
> [...]
> > @@ -441,13 +449,43 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
> >  			  nla_len(upcall_info->userdata),
> >  			  nla_data(upcall_info->userdata));
> >  
> > -	nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
> > +	/* Only reserve room for attribute header, packet data is added
> > +	 * in skb_zerocopy() */
> > +	if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0)))
> > +		goto out;
> > +	nla->nla_len = nla_attr_size(skb->len);
> >  
> > -	skb_copy_and_csum_dev(skb, nla_data(nla));
> > +	skb_zerocopy(user_skb, skb, skb->len, hlen);
> >  
> > -	genlmsg_end(user_skb, upcall);
> > -	err = genlmsg_unicast(net, user_skb, upcall_info->portid);
> > +	/* OVS user space expects the size of the message to be aligned to
> > +	 * NLA_ALIGNTO. Aligning nlmsg_len is not enough, the actual bytes
> > +	 * read must match nlmsg_len.
> > +	 */
> > +	plen = NLA_ALIGN(user_skb->len) - user_skb->len;
> > +	if (plen > 0) {
> > +		int nr_frags = skb_shinfo(user_skb)->nr_frags;
> > +
> > +		if (nr_frags) {
> > +			skb_frag_t *frag;
> > +
> > +			frag = &skb_shinfo(user_skb)->frags[nr_frags -1];
> > +			skb_frag_size_add(frag, plen);
> 
> It looks like this is effectively padding with whatever happens to
> follow the original packet content.  This could result in a small
> information leak.  If the fragment has non-zero offset and already
> extends to the end of a page, this could result in a segfault as the
> next page may be unmapped.
> 
> Perhaps you could add the padding as an extra fragment pointing to a
> preallocated zero page.  If the skb already has the maximum number of
> fragments, you would have to copy the last fragment in order to add
> padding.

You are right and thanks for the review Ben.

Realizing how complex this becomes I'm leaning towards avoiding
padding alltogether by fixing OVS user space to no longer enforce
it, signal this capability via a flag to the kernel and only
perform zerocopy for enabled OVS user space counterparts.

  parent reply	other threads:[~2013-11-10 23:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-08  9:15 [PATCH 0/2 v3] Open vSwitch zerocopy upcall Thomas Graf
     [not found] ` <cover.1383901577.git.tgraf-G/eBtMaohhA@public.gmane.org>
2013-11-08  9:15   ` [PATCH 1/2 net-next] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
2013-11-08  9:15 ` [PATCH 2/2 net-next] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
2013-11-09 22:02   ` Ben Hutchings
     [not found]     ` <1384034549.3802.41.camel-/LGg1Z1CJKQ+9kgCwbf1HqK4ta4zdZpAajtMo4Cw6ucAvxtiuMwx3w@public.gmane.org>
2013-11-10 23:40       ` Thomas Graf [this message]
2013-11-11  1:55         ` Jesse Gross
  -- strict thread matches above, loose matches on Subject: below --
2013-11-08 10:47 [PATCH 0/2 v4] Open vSwitch zerocopy upcall Thomas Graf
2013-11-08 10:47 ` [PATCH 2/2 net-next] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
     [not found]   ` <9369d871e13fe5b6e468b269450b9f2032a970aa.1383906944.git.tgraf-G/eBtMaohhA@public.gmane.org>
2013-11-08 11:24     ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131110234033.GA18099@casper.infradead.org \
    --to=tgraf-g/ebtmaohha@public.gmane.org \
    --cc=bhutchings-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
    --cc=dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org \
    --cc=eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).