Re: [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Hangbin Liu <liuhangbin@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	"Jiri Benc" <jbenc@redhat.com>,
	"Eelco Chaudron" <echaudro@redhat.com>,
	ast@kernel.org, "Daniel Borkmann" <daniel@iogearbox.net>,
	"Lorenzo Bianconi" <lorenzo.bianconi@redhat.com>,
	"David Ahern" <dsahern@gmail.com>,
	"Andrii Nakryiko" <andrii.nakryiko@gmail.com>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
	"Björn Töpel" <bjorn.topel@gmail.com>,
	"Martin KaFai Lau" <kafai@fb.com>,
	brouer@redhat.com
Subject: Re: [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support
Date: Sat, 24 Apr 2021 09:01:29 +0200	[thread overview]
Message-ID: <20210424090129.1b8fe377@carbon> (raw)
In-Reply-To: <20210424010925.GG3465@Leo-laptop-t470s>

On Sat, 24 Apr 2021 09:09:25 +0800
Hangbin Liu <liuhangbin@gmail.com> wrote:

> On Fri, Apr 23, 2021 at 06:54:29PM +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 22 Apr 2021 20:02:18 +0200
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >   
> > > Jesper Dangaard Brouer <brouer@redhat.com> writes:
> > >   
> > > > On Thu, 22 Apr 2021 15:14:52 +0800
> > > > Hangbin Liu <liuhangbin@gmail.com> wrote:
> > > >    
> > > >> diff --git a/net/core/filter.c b/net/core/filter.c
> > > >> index cae56d08a670..afec192c3b21 100644
> > > >> --- a/net/core/filter.c
> > > >> +++ b/net/core/filter.c    
> > > > [...]    
> > > >>  int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  		    struct bpf_prog *xdp_prog)
> > > >>  {
> > > >> @@ -3933,6 +3950,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  	enum bpf_map_type map_type = ri->map_type;
> > > >>  	void *fwd = ri->tgt_value;
> > > >>  	u32 map_id = ri->map_id;
> > > >> +	struct bpf_map *map;
> > > >>  	int err;
> > > >>  
> > > >>  	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
> > > >> @@ -3942,7 +3960,12 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > > >>  	case BPF_MAP_TYPE_DEVMAP:
> > > >>  		fallthrough;
> > > >>  	case BPF_MAP_TYPE_DEVMAP_HASH:
> > > >> -		err = dev_map_enqueue(fwd, xdp, dev);
> > > >> +		map = xchg(&ri->map, NULL);    
> > > >
> > > > Hmm, this looks dangerous for performance to have on this fast-path.
> > > > The xchg call can be expensive, AFAIK this is an atomic operation.    
> > > 
> > > Ugh, you're right. That's my bad, I suggested replacing the
> > > READ_ONCE()/WRITE_ONCE() pair with the xchg() because an exchange is
> > > what it's doing, but I failed to consider the performance implications
> > > of the atomic operation. Sorry about that, Hangbin! I guess this should
> > > be changed to:
> > > 
> > > +		map = READ_ONCE(ri->map);
> > > +		if (map) {
> > > +			WRITE_ONCE(ri->map, NULL);
> > > +			err = dev_map_enqueue_multi(xdp, dev, map,
> > > +						    ri->flags & BPF_F_EXCLUDE_INGRESS);
> > > +		} else {
> > > +			err = dev_map_enqueue(fwd, xdp, dev);
> > > +		}  
> > 
> > This is highly sensitive fast-path code, as you saw Bjørn have been
> > hunting nanosec in this area.  The above code implicitly have "map" as
> > the likely option, which I don't think it is.  
> 
> Hi Jesper,
> 
> From the performance data, there is only a slightly impact. Do we still need
> to block the whole patch on this? Or if you have a better solution?

I'm basically just asking you to add an unlikely() annotation:

	map = READ_ONCE(ri->map);
	if (unlikely(map)) {
		WRITE_ONCE(ri->map, NULL);
		err = dev_map_enqueue_multi(xdp, dev, map, [...]

For XDP, performance is the single most important factor!  You say your
performance data, there is only a slightly impact, there must be ZERO
impact (when your added features is not in use).

You data:
 Version          | Test                                | Generic | Native
 5.12 rc4         | redirect_map        i40e->i40e      |    1.9M |  9.6M
 5.12 rc4 + patch | redirect_map        i40e->i40e      |    1.9M |  9.3M

The performance difference 9.6M -> 9.3M is a slowdown of 3.36 nanosec.
Bjørn and others have been working really hard to optimize the code and
remove down to 1.5 nanosec overheads.  Thus, introducing 3.36 nanosec
added overhead to the fast-path is significant.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2021-04-24  7:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22  7:14 [PATCHv9 bpf-next 0/4] xdp: extend xdp_redirect_map with broadcast support Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support Hangbin Liu
2021-04-22 16:53   ` Jesper Dangaard Brouer
2021-04-22 18:02     ` Toke Høiland-Jørgensen
2021-04-23 16:54       ` Jesper Dangaard Brouer
2021-04-24  1:09         ` Hangbin Liu
2021-04-24  7:01           ` Jesper Dangaard Brouer [this message]
2021-04-24  9:53             ` Toke Høiland-Jørgensen
2021-04-24 13:55               ` Hangbin Liu
2021-04-26  6:01             ` Hangbin Liu
2021-04-26  9:23               ` Jesper Dangaard Brouer
2021-04-26 10:25                 ` Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 3/4] sample/bpf: add xdp_redirect_map_multi for redirect_map broadcast test Hangbin Liu
2021-04-22  7:14 ` [PATCHv9 bpf-next 4/4] selftests/bpf: add xdp_redirect_multi test Hangbin Liu
2021-04-26  9:28   ` Jesper Dangaard Brouer
2021-04-26 10:19     ` Hangbin Liu
2021-04-26 14:29       ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210424090129.1b8fe377@carbon \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=echaudro@redhat.com \
    --cc=jbenc@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=liuhangbin@gmail.com \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).