netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <jakub.kicinski@netronome.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next v3 2/3] xdp: Always use a devmap for XDP_REDIRECT to a device
Date: Mon, 4 Mar 2019 09:44:52 -0800	[thread overview]
Message-ID: <20190304094452.375ba6e7@cakuba.netronome.com> (raw)
In-Reply-To: <877edelukk.fsf@toke.dk>

On Mon, 04 Mar 2019 12:58:51 +0100, Toke Høiland-Jørgensen wrote:
> >> diff --git a/include/net/netns/xdp.h b/include/net/netns/xdp.h
> >> index e5734261ba0a..4935dfe1cf43 100644
> >> --- a/include/net/netns/xdp.h
> >> +++ b/include/net/netns/xdp.h
> >> @@ -4,10 +4,21 @@
> >>  
> >>  #include <linux/rculist.h>
> >>  #include <linux/mutex.h>
> >> +#include <linux/atomic.h>
> >> +
> >> +struct bpf_dtab;
> >> +
> >> +struct bpf_dtab_container {
> >> +	struct bpf_dtab __rcu *dtab;
> >> +	atomic_t refcnt;  
> >
> > refcount_t ?  I'm not sure what the rules are for non-performance
> > critical stuff are..  
> 
> I based the use of atomic_t on what bpf/syscall.c is doing. The problem
> with using refcount_t is that it's a bit of a weird refcount because
> it's not strictly tied to the object lifetime (as the objects are only
> allocated if *both* the globals and the per-namespace refcnt are >0).
> 
> >> +static void __dev_map_release_default_map(struct bpf_dtab_container *cont)
> >> +{
> >> +	struct bpf_dtab *dtab = NULL;
> >> +
> >> +	lockdep_assert_held(&dev_map_lock);
> >> +
> >> +	dtab = rcu_dereference(cont->dtab);
> >> +	if (dtab) {  
> >
> > How can it be NULL if it's refcounted? hm..  
> 
> See above; it's not actually the object that is refcounted, it's a count
> of the number of active XDP programs in the namespace, which still needs
> to be kept so we can allocate the devmaps when a REDIRECT program is
> loaded.

I think I got it now.  I wonder if anything could be done to improve
the readability of the dual-refcnting :S  Maybe use the "needed"/"used"
terms throughout?

> >> +		list_del_rcu(&dtab->list);
> >> +		rcu_assign_pointer(cont->dtab, NULL);
> >> +		bpf_clear_redirect_map(&dtab->map);
> >> +		call_rcu(&dtab->rcu, __dev_map_free);
> >> +	}
> >> +}
> >> +
> >> +void dev_map_put_default_map(struct net *net)
> >> +{
> >> +	if (atomic_dec_and_test(&net->xdp.default_map.refcnt)) {
> >> +		spin_lock(&dev_map_lock);
> >> +		__dev_map_release_default_map(&net->xdp.default_map);
> >> +		spin_unlock(&dev_map_lock);
> >> +	}
> >> +}
> >> +
> >> +static int __init_default_map(struct bpf_dtab_container *cont)
> >> +{
> >> +	struct net *net = bpf_default_map_to_net(cont);
> >> +	struct bpf_dtab *dtab, *old_dtab;
> >> +	int size = DEV_MAP_DEFAULT_SIZE;
> >> +	struct net_device *netdev;
> >> +	union bpf_attr attr = {};
> >> +	u32 idx;
> >> +	int err;
> >> +
> >> +	lockdep_assert_held(&dev_map_lock);
> >> +
> >> +	if (!atomic_read(&global_redirect_use))
> >> +		return 0;
> >> +
> >> +	for_each_netdev(net, netdev)
> >> +		if (netdev->ifindex >= size)
> >> +			size <<= 1;
> >> +
> >> +	old_dtab = rcu_dereference(cont->dtab);
> >> +	if (old_dtab && old_dtab->map.max_entries == size)
> >> +		return 0;
> >> +
> >> +	dtab = kzalloc(sizeof(*dtab), GFP_USER);  
> >
> > You are calling this with a spin lock held :(  
> 
> Yeah, not just this, there are more allocations in dev_map_init_map().
> Hmm, I guess I need to think about the concurrency bits some more; it is
> made a bit tricky by the refcounts being outside the objects.
> 
> Basically what I'm trying to protect against is the case where the
> global refcount goes to zero and then immediately back to one (or vice
> versa), which results in a deallocation immediately followed by an
> allocation. I was worried about the allocation and deallocation racing
> with each other (because there's a window after the refcnt is checked
> before the (de)allocation begins). It's not quite clear to me whether
> RCU is enough to protect against this... Thoughts?

Hm.  I think you'll still need a lock (mutex?) on the alloc path, but
the free path should be fine as long as you load the map pointer before
looking at the refcnt (atomic op ensuring the barrier there).

> > Could you please add tests for the replacement combinations?  
> 
> Sure. But, erm, where? selftests? :)

Yes :)  selftests/bpf/  Look around at the scripts, AFAIK we don't have
much in the way of standard infra for system interaction tests like
this :S

  reply	other threads:[~2019-03-04 17:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-01 14:12 [PATCH net-next v3 0/3] xdp: Use a default map for xdp_redirect helper Toke Høiland-Jørgensen
2019-03-01 14:12 ` [PATCH net-next v3 2/3] xdp: Always use a devmap for XDP_REDIRECT to a device Toke Høiland-Jørgensen
2019-03-02  2:09   ` Jakub Kicinski
2019-03-04 11:58     ` Toke Høiland-Jørgensen
2019-03-04 17:44       ` Jakub Kicinski [this message]
2019-03-04 19:05         ` Toke Høiland-Jørgensen
2019-03-04 22:15           ` Jakub Kicinski
2019-03-04 22:28             ` Toke Høiland-Jørgensen
2019-03-04 22:49               ` Jakub Kicinski
2019-03-05  9:53                 ` Toke Høiland-Jørgensen
2019-03-01 14:12 ` [PATCH net-next v3 1/3] xdp: Refactor devmap code in preparation for subsequent additions Toke Høiland-Jørgensen
2019-03-02  1:08   ` Jakub Kicinski
2019-03-04 12:47     ` Toke Høiland-Jørgensen
2019-03-04 17:08       ` Jakub Kicinski
2019-03-04 17:37         ` Toke Høiland-Jørgensen
2019-03-01 14:12 ` [PATCH net-next v3 3/3] xdp: Add devmap_idx map type for looking up devices by ifindex Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190304094452.375ba6e7@cakuba.netronome.com \
    --to=jakub.kicinski@netronome.com \
    --cc=ast@kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).