All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Hangbin Liu <liuhangbin@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	"Jiri Benc" <jbenc@redhat.com>,
	"Eelco Chaudron" <echaudro@redhat.com>,
	ast@kernel.org, "Daniel Borkmann" <daniel@iogearbox.net>,
	"Lorenzo Bianconi" <lorenzo.bianconi@redhat.com>,
	"David Ahern" <dsahern@gmail.com>,
	"Andrii Nakryiko" <andrii.nakryiko@gmail.com>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
	"Björn Töpel" <bjorn.topel@gmail.com>,
	"Martin KaFai Lau" <kafai@fb.com>,
	brouer@redhat.com
Subject: Re: [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support
Date: Mon, 26 Apr 2021 11:23:08 +0200	[thread overview]
Message-ID: <20210426112308.580cf98e@carbon> (raw)
In-Reply-To: <20210426060117.GN3465@Leo-laptop-t470s>

On Mon, 26 Apr 2021 14:01:17 +0800
Hangbin Liu <liuhangbin@gmail.com> wrote:

> On Sat, Apr 24, 2021 at 09:01:29AM +0200, Jesper Dangaard Brouer wrote:
> > > > > >> @@ -3942,7 +3960,12 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > > > >>  	case BPF_MAP_TYPE_DEVMAP:
> > > > > >>  		fallthrough;
> > > > > >>  	case BPF_MAP_TYPE_DEVMAP_HASH:
> > > > > >> -		err = dev_map_enqueue(fwd, xdp, dev);
> > > > > >> +		map = xchg(&ri->map, NULL);      
> > > > > >
> > > > > > Hmm, this looks dangerous for performance to have on this fast-path.
> > > > > > The xchg call can be expensive, AFAIK this is an atomic operation.      
> > > > > 
> > > > > Ugh, you're right. That's my bad, I suggested replacing the
> > > > > READ_ONCE()/WRITE_ONCE() pair with the xchg() because an exchange is
> > > > > what it's doing, but I failed to consider the performance implications
> > > > > of the atomic operation. Sorry about that, Hangbin! I guess this should
> > > > > be changed to:
> > > > > 
> > > > > +		map = READ_ONCE(ri->map);
> > > > > +		if (map) {
> > > > > +			WRITE_ONCE(ri->map, NULL);
> > > > > +			err = dev_map_enqueue_multi(xdp, dev, map,
> > > > > +						    ri->flags & BPF_F_EXCLUDE_INGRESS);
> > > > > +		} else {
> > > > > +			err = dev_map_enqueue(fwd, xdp, dev);
> > > > > +		}    
> > > > 
> > > > This is highly sensitive fast-path code, as you saw Bjørn have been
> > > > hunting nanosec in this area.  The above code implicitly have "map" as
> > > > the likely option, which I don't think it is.    
> > > 
> > > Hi Jesper,
> > > 
> > > From the performance data, there is only a slightly impact. Do we still need
> > > to block the whole patch on this? Or if you have a better solution?  
> > 
> > I'm basically just asking you to add an unlikely() annotation:
> > 
> > 	map = READ_ONCE(ri->map);
> > 	if (unlikely(map)) {
> > 		WRITE_ONCE(ri->map, NULL);
> > 		err = dev_map_enqueue_multi(xdp, dev, map, [...]
> > 
> > For XDP, performance is the single most important factor!  You say your
> > performance data, there is only a slightly impact, there must be ZERO
> > impact (when your added features is not in use).
> > 
> > You data:
> >  Version          | Test                                | Generic | Native
> >  5.12 rc4         | redirect_map        i40e->i40e      |    1.9M |  9.6M
> >  5.12 rc4 + patch | redirect_map        i40e->i40e      |    1.9M |  9.3M
> > 
> > The performance difference 9.6M -> 9.3M is a slowdown of 3.36 nanosec.
> > Bjørn and others have been working really hard to optimize the code and
> > remove down to 1.5 nanosec overheads.  Thus, introducing 3.36 nanosec
> > added overhead to the fast-path is significant.  
> 
> I re-check the performance data. The data
> > Version          | Test                                | Generic | Native
> > 5.12 rc4         | redirect_map        i40e->i40e      |    1.9M |  9.6M
> > 5.12 rc4 + patch | redirect_map        i40e->i40e      |    1.9M |  9.3M  
> 
> is done on version 5.
> 
> Today I re-did the test, on version 10, with xchg() changed to
> READ_ONCE/WRITE_ONCE. Here is the new data (Generic path data was omitted
> as there is no change)
> 
> Version          | Test                                | Generic | Native
> 5.12 rc4         | redirect_map        i40e->i40e      |  9.7M
> 5.12 rc4         | redirect_map        i40e->veth      | 11.8M
> 
> 5.12 rc4 + patch | redirect_map        i40e->i40e      |  9.6M

Great to see the baseline redirect_map (i40e->i40e) have almost no
impact, only 1.07 ns ((1/9.7-1/9.6)*1000), which is what we want to
see.  (It might be zero as measurements can fluctuate when diff is
below 2ns)


> 5.12 rc4 + patch | redirect_map        i40e->veth      | 11.6M

What XDP program are you running on the inner veth?

> 5.12 rc4 + patch | redirect_map multi  i40e->i40e      |  9.5M

I'm very surprised to see redirect_map multi being so fast (9.5M vs.
9.6M normal map-redir).  I was expecting to see larger overhead, as the
code dev_map_enqueue_clone() would clone the packet in xdpf_clone() via
allocating a new page (dev_alloc_page) and then doing a memcpy().

Looking closer at this patchset, I realize that the test
'redirect_map-multi' is testing an optimization, and will never call
dev_map_enqueue_clone() + xdpf_clone().  IMHO trying to optimize
'redirect_map-multi' to be just as fast as base 'redirect_map' doesn't
make much sense.  If the 'broadcast' call only send a single packet,
then there isn't any reason to call the 'multi' variant.

Does the 'selftests/bpf' make sure to activate the code path that does
cloning?

> 5.12 rc4 + patch | redirect_map multi  i40e->veth      | 11.5M
> 5.12 rc4 + patch | redirect_map multi  i40e->mlx4+veth |  3.9M
> 
> And after add unlikely() in the check path, the new data looks like
> 
> Version          | Test                                | Native
> 5.12 rc4 + patch | redirect_map        i40e->i40e      |  9.6M
> 5.12 rc4 + patch | redirect_map        i40e->veth      | 11.7M
> 5.12 rc4 + patch | redirect_map multi  i40e->i40e      |  9.4M
> 5.12 rc4 + patch | redirect_map multi  i40e->veth      | 11.4M
> 5.12 rc4 + patch | redirect_map multi  i40e->mlx4+veth |  3.8M
> 
> So with unlikely(), the redirect_map is a slightly up, while redirect_map
> broadcast has a little drawback. But for the total data it looks this time
> there is no much gap compared with no this patch for redirect_map.
> 
> Do you think we still need the unlikely() in check path?

Yes.  The call to redirect_map multi is allowed (and expected) to be
slower, because when using it to broadcast packets we expect that
dev_map_enqueue_clone() + xdpf_clone() will get activated, which will
be the dominating overhead.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  reply	other threads:[~2021-04-26  9:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22  7:14 [PATCHv9 bpf-next 0/4] xdp: extend xdp_redirect_map with broadcast support Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support Hangbin Liu
2021-04-22 16:53   ` Jesper Dangaard Brouer
2021-04-22 18:02     ` Toke Høiland-Jørgensen
2021-04-23 16:54       ` Jesper Dangaard Brouer
2021-04-24  1:09         ` Hangbin Liu
2021-04-24  7:01           ` Jesper Dangaard Brouer
2021-04-24  9:53             ` Toke Høiland-Jørgensen
2021-04-24 13:55               ` Hangbin Liu
2021-04-26  6:01             ` Hangbin Liu
2021-04-26  9:23               ` Jesper Dangaard Brouer [this message]
2021-04-26 10:25                 ` Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 3/4] sample/bpf: add xdp_redirect_map_multi for redirect_map broadcast test Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 4/4] selftests/bpf: add xdp_redirect_multi test Hangbin Liu
2021-04-26  9:28   ` Jesper Dangaard Brouer
2021-04-26 10:19     ` Hangbin Liu
2021-04-26 14:29       ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426112308.580cf98e@carbon \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=echaudro@redhat.com \
    --cc=jbenc@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=liuhangbin@gmail.com \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.