All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next 2/2] xdp: Add devmap_idx map type for looking up devices by ifindex
Date: Fri, 22 Feb 2019 10:47:10 +0100	[thread overview]
Message-ID: <87y368gnoh.fsf@toke.dk> (raw)
In-Reply-To: <20190221163218.72905325@cakuba.netronome.com>

Jakub Kicinski <jakub.kicinski@netronome.com> writes:

> On Fri, 22 Feb 2019 00:02:23 +0100, Toke Høiland-Jørgensen wrote:
>> Jakub Kicinski <jakub.kicinski@netronome.com> writes:
>> 
>> > On Thu, 21 Feb 2019 12:56:54 +0100, Toke Høiland-Jørgensen wrote:  
>> >> A common pattern when using xdp_redirect_map() is to create a device map
>> >> where the lookup key is simply ifindex. Because device maps are arrays,
>> >> this leaves holes in the map, and the map has to be sized to fit the
>> >> largest ifindex, regardless of how many devices actually are actually
>> >> needed in the map.
>> >> 
>> >> This patch adds a second type of device map where the key is interpreted as
>> >> an ifindex and looked up using a hashmap, instead of being used as an array
>> >> index. This leads to maps being densely packed, so they can be smaller.
>> >> 
>> >> The default maps used by xdp_redirect() are changed to use the new map
>> >> type, which means that xdp_redirect() is no longer limited to ifindex < 64,
>> >> but instead to 64 total simultaneous interfaces per network namespace. This
>> >> also provides an easy way to compare the performance of devmap and
>> >> devmap_idx:
>> >> 
>> >> xdp_redirect_map (devmap): 8394560 pkt/s
>> >> xdp_redirect (devmap_idx): 8179480 pkt/s
>> >> 
>> >> Difference: 215080 pkt/s or 3.1 nanoseconds per packet.  
>> >
>> > Could you share what the ifindex mix was here, to arrive at these
>> > numbers? How does it compare to using an array but not keying with
>> > ifindex?  
>> 
>> Just the standard set on my test machine; ifindex 1 through 9, except 8
>> in this case. So certainly no more than 1 ifindex in each hash bucket
>> for those numbers.
>
> Oh, I clearly misread your numbers, it's still slower than array, you
> just don't need the size limit.

Yeah, this is not about speeding up devmap, it's about lifting the size
restriction.

>> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>  
>> >  
>> >> +static int dev_map_idx_update_elem(struct bpf_map *map, void *key, void *value,
>> >> +				   u64 map_flags)
>> >> +{
>> >> +	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
>> >> +	struct bpf_dtab_netdev *dev, *old_dev;
>> >> +	u32 idx = *(u32 *)key;
>> >> +	u32 val = *(u32 *)value;
>> >> +	u32 bit;
>> >> +
>> >> +	if (unlikely(map_flags > BPF_EXIST))
>> >> +		return -EINVAL;
>> >> +	if (unlikely(map_flags == BPF_NOEXIST))
>> >> +		return -EEXIST;
>> >> +
>> >> +	old_dev = __dev_map_idx_lookup_elem(map, idx);
>> >> +	if (!val) {
>> >> +		if (!old_dev)
>> >> +			return 0;  
>> >
>> > IMHO this is a fairly strange mix of array and hashmap semantics. I
>> > think you should stick to hashmap behaviour AFA flags and
>> > update/delete goes.  
>> 
>> Yeah, the double book-keeping is a bit strange, but it allows the actual
>> forwarding and flush code to be reused between both types of maps. I
>> think this is worth the slight semantic confusion :)
>
> I'm not sure I was clear, let me try again :) Your get_next_key only
> reports existing indexes if I read the code right, so that's not an
> array - in an array indexes always exist. What follows inserting 0
> should not be equivalent to delete and BPF_NOEXIST should be handled
> appropriately.

Ah, I see what you mean. Yeah, sure, I guess I can restrict deletion to
only working through explicit delete.

I could also add a fail on NOEXIST, but since each index is tied to a
particular value, you can't actually change the contents of each index,
only insert and remove. So why would you ever set that flag?

> Different maps behave differently, I think it's worth trying to limit
> the divergence in how things behave to the basic array and a hashmap
> models when possible.

So I don't actually think of this as a hashmap in the general sense;
after all, you can only store ifindexes in it, and key and value are
tied to one another. So it's an ifindex'ed devmap (which is also why I
named it devmap_idx and not devmap_hash); the fact that it's implemented
as a hashmap is just incidental.

So I guess it's a choice between being consistent with the other devmap
type, or with a general hashmap. I'm not actually sure that the latter
is less surprising? :)

-Toke

  reply	other threads:[~2019-02-22  9:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-21 11:56 [PATCH net-next 1/2] xdp: Always use a devmap for XDP_REDIRECT to a device Toke Høiland-Jørgensen
2019-02-21 11:56 ` [PATCH net-next 2/2] xdp: Add devmap_idx map type for looking up devices by ifindex Toke Høiland-Jørgensen
2019-02-21 15:23   ` Jesper Dangaard Brouer
2019-02-21 15:50     ` Toke Høiland-Jørgensen
2019-02-21 21:49   ` Jakub Kicinski
2019-02-21 23:02     ` Toke Høiland-Jørgensen
2019-02-22  0:32       ` Jakub Kicinski
2019-02-22  9:47         ` Toke Høiland-Jørgensen [this message]
2019-02-22 21:30           ` Jakub Kicinski
2019-02-23 11:52             ` Toke Høiland-Jørgensen
2019-02-23 23:19   ` kbuild test robot
2019-02-23 23:28   ` kbuild test robot
2019-02-21 15:19 ` [PATCH net-next 1/2] xdp: Always use a devmap for XDP_REDIRECT to a device Jesper Dangaard Brouer
2019-02-21 15:52   ` Toke Høiland-Jørgensen
2019-02-22  0:36 ` Jakub Kicinski
2019-02-22 10:13   ` Toke Høiland-Jørgensen
2019-02-22 21:37     ` Jakub Kicinski
2019-02-23 10:43       ` Jesper Dangaard Brouer
2019-02-23 12:11         ` Toke Høiland-Jørgensen
2019-02-25 18:47           ` Jakub Kicinski
2019-02-26 11:00             ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y368gnoh.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=ast@kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.