* [PATCH net-next v2 0/4] Couple of BPF refcount fixes for mlx5
@ 2016-11-16 0:04 Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 1/4] bpf, mlx5: fix mlx5e_create_rq taking reference on prog Daniel Borkmann
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 0:04 UTC (permalink / raw)
To: davem
Cc: alexei.starovoitov, bblanco, zhiyisun, ranas, saeedm, netdev,
Daniel Borkmann
Various mlx5 bugs on eBPF refcount handling found during review.
Last patch in series adds a __must_check to BPF helpers to make
sure we won't run into it again w/o compiler complaining first.
v1 -> v2:
- After discussion with Alexei, we agreed upon rebasing the
patches against net-next.
- Since net-next, I've also added the __must_check to enforce
future users to check for errors.
- Fixed up commit message #2.
- Simplify assignment from patch #1 based on Saeed's feedback
on previous set.
Thanks a lot!
Daniel Borkmann (4):
bpf, mlx5: fix mlx5e_create_rq taking reference on prog
bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
bpf: add __must_check attributes to refcount manipulating helpers
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 41 ++++++++++++++++++-----
include/linux/bpf.h | 12 ++++---
kernel/bpf/syscall.c | 1 +
3 files changed, 40 insertions(+), 14 deletions(-)
--
1.9.3
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next v2 1/4] bpf, mlx5: fix mlx5e_create_rq taking reference on prog
2016-11-16 0:04 [PATCH net-next v2 0/4] Couple of BPF refcount fixes for mlx5 Daniel Borkmann
@ 2016-11-16 0:04 ` Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set Daniel Borkmann
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 0:04 UTC (permalink / raw)
To: davem
Cc: alexei.starovoitov, bblanco, zhiyisun, ranas, saeedm, netdev,
Daniel Borkmann
In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
without checking the return value. bpf_prog_add() can fail since 92117d8443bc
("bpf: fix refcnt overflow"), so we really must check it. Take the reference
right when we assign it to the rq from priv->xdp_prog, and just drop the
reference on error path. Destruction in mlx5e_destroy_rq() looks good, though.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 +++++++++----
kernel/bpf/syscall.c | 1 +
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 16a9a10..ab0c336 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -513,7 +513,13 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
rq->channel = c;
rq->ix = c->ix;
rq->priv = c->priv;
- rq->xdp_prog = priv->xdp_prog;
+
+ rq->xdp_prog = priv->xdp_prog ? bpf_prog_inc(priv->xdp_prog) : NULL;
+ if (IS_ERR(rq->xdp_prog)) {
+ err = PTR_ERR(rq->xdp_prog);
+ rq->xdp_prog = NULL;
+ goto err_rq_wq_destroy;
+ }
rq->buff.map_dir = DMA_FROM_DEVICE;
if (rq->xdp_prog)
@@ -590,12 +596,11 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
rq->page_cache.head = 0;
rq->page_cache.tail = 0;
- if (rq->xdp_prog)
- bpf_prog_add(rq->xdp_prog, 1);
-
return 0;
err_rq_wq_destroy:
+ if (rq->xdp_prog)
+ bpf_prog_put(rq->xdp_prog);
mlx5_wq_destroy(&rq->wq_ctrl);
return err;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ce1b7de..eb15498 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -696,6 +696,7 @@ struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
{
return bpf_prog_add(prog, 1);
}
+EXPORT_SYMBOL_GPL(bpf_prog_inc);
static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type)
{
--
1.9.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
2016-11-16 0:04 [PATCH net-next v2 0/4] Couple of BPF refcount fixes for mlx5 Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 1/4] bpf, mlx5: fix mlx5e_create_rq taking reference on prog Daniel Borkmann
@ 2016-11-16 0:04 ` Daniel Borkmann
2016-11-16 12:25 ` Saeed Mahameed
2016-11-16 0:04 ` [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 4/4] bpf: add __must_check attributes to refcount manipulating helpers Daniel Borkmann
3 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 0:04 UTC (permalink / raw)
To: davem
Cc: alexei.starovoitov, bblanco, zhiyisun, ranas, saeedm, netdev,
Daniel Borkmann
There are multiple issues in mlx5e_xdp_set():
1) The batched bpf_prog_add() is currently not checked for errors! When
doing so, it should be done at an earlier point in time to makes sure
that we cannot fail anymore at the time we want to set the program for
each channel. This only means that we have to undo the bpf_prog_add()
in case we return due to required reset.
2) When swapping the priv->xdp_prog, then no extra reference count must be
taken since we got that from call path via dev_change_xdp_fd() already.
Otherwise, we'd never be able to free the program. Also, bpf_prog_add()
without checking the return code could fail.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ab0c336..cf26672 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
goto unlock;
}
+ if (prog) {
+ /* num_channels is invariant here, so we can take the
+ * batched reference right upfront.
+ */
+ prog = bpf_prog_add(prog, priv->params.num_channels);
+ if (IS_ERR(prog)) {
+ err = PTR_ERR(prog);
+ goto unlock;
+ }
+ }
+
was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
/* no need for full reset when exchanging programs */
reset = (!priv->xdp_prog || !prog);
@@ -3149,10 +3160,10 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
if (was_opened && reset)
mlx5e_close_locked(netdev);
- /* exchange programs */
+ /* exchange programs, extra prog reference we got from caller
+ * as long as we don't fail from this point onwards.
+ */
old_prog = xchg(&priv->xdp_prog, prog);
- if (prog)
- bpf_prog_add(prog, 1);
if (old_prog)
bpf_prog_put(old_prog);
@@ -3163,12 +3174,11 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
mlx5e_open_locked(netdev);
if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset)
- goto unlock;
+ goto unlock_put;
/* exchanging programs w/o reset, we update ref counts on behalf
* of the channels RQs here.
*/
- bpf_prog_add(prog, priv->params.num_channels);
for (i = 0; i < priv->params.num_channels; i++) {
struct mlx5e_channel *c = priv->channel[i];
@@ -3190,6 +3200,11 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
unlock:
mutex_unlock(&priv->state_lock);
return err;
+unlock_put:
+ /* reference on priv->xdp_prog is still held at this point */
+ if (prog)
+ bpf_prog_sub(prog, priv->params.num_channels);
+ goto unlock;
}
static bool mlx5e_xdp_attached(struct net_device *dev)
--
1.9.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
2016-11-16 0:04 [PATCH net-next v2 0/4] Couple of BPF refcount fixes for mlx5 Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 1/4] bpf, mlx5: fix mlx5e_create_rq taking reference on prog Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set Daniel Borkmann
@ 2016-11-16 0:04 ` Daniel Borkmann
2016-11-16 12:51 ` Saeed Mahameed
2016-11-16 0:04 ` [PATCH net-next v2 4/4] bpf: add __must_check attributes to refcount manipulating helpers Daniel Borkmann
3 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 0:04 UTC (permalink / raw)
To: davem
Cc: alexei.starovoitov, bblanco, zhiyisun, ranas, saeedm, netdev,
Daniel Borkmann
mlx5e_xdp_set() is currently the only place where we drop reference on the
prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
need to make sure that we eventually release that reference, for example,
in case the netdev is dismantled.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cf26672..60fe54c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3715,6 +3715,9 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
if (MLX5_CAP_GEN(mdev, vport_group_manager))
mlx5_eswitch_unregister_vport_rep(esw, 0);
+
+ if (priv->xdp_prog)
+ bpf_prog_put(priv->xdp_prog);
}
static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
--
1.9.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 4/4] bpf: add __must_check attributes to refcount manipulating helpers
2016-11-16 0:04 [PATCH net-next v2 0/4] Couple of BPF refcount fixes for mlx5 Daniel Borkmann
` (2 preceding siblings ...)
2016-11-16 0:04 ` [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup Daniel Borkmann
@ 2016-11-16 0:04 ` Daniel Borkmann
3 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 0:04 UTC (permalink / raw)
To: davem
Cc: alexei.starovoitov, bblanco, zhiyisun, ranas, saeedm, netdev,
Daniel Borkmann
Helpers like bpf_prog_add(), bpf_prog_inc(), bpf_map_inc() can fail
with an error, so make sure the caller properly checks their return
value and not just ignores it (which could worst-case lead to use
after free).
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpf.h | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 01c1487..69d0a7f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -233,14 +233,14 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
struct bpf_prog *bpf_prog_get(u32 ufd);
struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
-struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i);
+struct bpf_prog * __must_check bpf_prog_add(struct bpf_prog *prog, int i);
void bpf_prog_sub(struct bpf_prog *prog, int i);
-struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
+struct bpf_prog * __must_check bpf_prog_inc(struct bpf_prog *prog);
void bpf_prog_put(struct bpf_prog *prog);
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
-struct bpf_map *bpf_map_inc(struct bpf_map *map, bool uref);
+struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
void bpf_map_put_with_uref(struct bpf_map *map);
void bpf_map_put(struct bpf_map *map);
int bpf_map_precharge_memlock(u32 pages);
@@ -299,7 +299,8 @@ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
{
return ERR_PTR(-EOPNOTSUPP);
}
-static inline struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
+static inline struct bpf_prog * __must_check bpf_prog_add(struct bpf_prog *prog,
+ int i)
{
return ERR_PTR(-EOPNOTSUPP);
}
@@ -311,7 +312,8 @@ static inline void bpf_prog_sub(struct bpf_prog *prog, int i)
static inline void bpf_prog_put(struct bpf_prog *prog)
{
}
-static inline struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
+
+static inline struct bpf_prog * __must_check bpf_prog_inc(struct bpf_prog *prog)
{
return ERR_PTR(-EOPNOTSUPP);
}
--
1.9.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
2016-11-16 0:04 ` [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set Daniel Borkmann
@ 2016-11-16 12:25 ` Saeed Mahameed
2016-11-16 13:54 ` Daniel Borkmann
0 siblings, 1 reply; 15+ messages in thread
From: Saeed Mahameed @ 2016-11-16 12:25 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> There are multiple issues in mlx5e_xdp_set():
>
> 1) The batched bpf_prog_add() is currently not checked for errors! When
> doing so, it should be done at an earlier point in time to makes sure
> that we cannot fail anymore at the time we want to set the program for
> each channel. This only means that we have to undo the bpf_prog_add()
> in case we return due to required reset.
>
> 2) When swapping the priv->xdp_prog, then no extra reference count must be
> taken since we got that from call path via dev_change_xdp_fd() already.
> Otherwise, we'd never be able to free the program. Also, bpf_prog_add()
> without checking the return code could fail.
>
> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++++++++++++++++++-----
> 1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index ab0c336..cf26672 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
> goto unlock;
> }
>
> + if (prog) {
> + /* num_channels is invariant here, so we can take the
> + * batched reference right upfront.
> + */
> + prog = bpf_prog_add(prog, priv->params.num_channels);
> + if (IS_ERR(prog)) {
> + err = PTR_ERR(prog);
> + goto unlock;
> + }
> + }
> +
With this approach you will end up taking a ref count twice per RQ! on
the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
One ref will be taken per RQ/Channel from the above code, and since
reset will be TRUE mlx5e_open_locked will be called and another ref
count will be taken on mlx5e_create_rq.
The issue here is that we have two places for ref count accounting,
xdp_set and mlx5e_create_rq. Having ref-count updates in
mlx5e_create_rq is essential for num_channels configuration changes
(mlx5e_set_ringparam).
Our previous approach made sure that only one path will do the ref
counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
reset == false).
We have two options:
1. remove ref count updates from mlx5e_create_rq and only do batch
updates in mlx5e_set_ringparam and mlx5e_xdp_set, the only two places
num_channels/priv->xdp_prog could change.
2. Keep the current approach and make sure to not update the ref count
twice, you can batch update only if (!reset && was_open) otherwise you
can rely on mlx5e_open_locked to take the num_channels ref count for
you.
Personally I prefer option 2, since we will keep the current logic
which will allow configuration updates such as (mlx5e_set_ringparam)
to not worry about ref counting since it will be done in the reset
flow.
Basically we want on xdp_set
-bpf_prog_add(prog, 1) // one ref for the device.
- if not going to reset bpf_prog_add(prog, num_channels) //batch update
- old_prog = xchg(&priv->xdp_prog, prog);
- if going to reset then reset . // reset will handle ref-counting
per channel.
- bpf_prog_put(old_prog) // put one ref for the device.
- if (!reset) bpf_prog_put(old_prog, num_channels) // if we didn't
reset, we need to release ref count on old_prog ourselves
Saeed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
2016-11-16 0:04 ` [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup Daniel Borkmann
@ 2016-11-16 12:51 ` Saeed Mahameed
2016-11-16 15:45 ` Daniel Borkmann
0 siblings, 1 reply; 15+ messages in thread
From: Saeed Mahameed @ 2016-11-16 12:51 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> mlx5e_xdp_set() is currently the only place where we drop reference on the
> prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
> need to make sure that we eventually release that reference, for example,
> in case the netdev is dismantled.
>
> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index cf26672..60fe54c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3715,6 +3715,9 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
>
> if (MLX5_CAP_GEN(mdev, vport_group_manager))
> mlx5_eswitch_unregister_vport_rep(esw, 0);
> +
> + if (priv->xdp_prog)
> + bpf_prog_put(priv->xdp_prog);
> }
>
I thought that on unregister_netdev ndo_xdp_set will be called with
NULL prog to cleanup. like any other resources (Vlans/mac_lists/
etc..), why xdp should be different ?
Anyway if this is the case, I am ok with this fix, you can even send
it to net (it looks like a serious leak).
Saeed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
2016-11-16 12:25 ` Saeed Mahameed
@ 2016-11-16 13:54 ` Daniel Borkmann
2016-11-16 14:30 ` Saeed Mahameed
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 13:54 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> There are multiple issues in mlx5e_xdp_set():
>>
>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>> doing so, it should be done at an earlier point in time to makes sure
>> that we cannot fail anymore at the time we want to set the program for
>> each channel. This only means that we have to undo the bpf_prog_add()
>> in case we return due to required reset.
>>
>> 2) When swapping the priv->xdp_prog, then no extra reference count must be
>> taken since we got that from call path via dev_change_xdp_fd() already.
>> Otherwise, we'd never be able to free the program. Also, bpf_prog_add()
>> without checking the return code could fail.
>>
>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++++++++++++++++++-----
>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index ab0c336..cf26672 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
>> goto unlock;
>> }
>>
>> + if (prog) {
>> + /* num_channels is invariant here, so we can take the
>> + * batched reference right upfront.
>> + */
>> + prog = bpf_prog_add(prog, priv->params.num_channels);
>> + if (IS_ERR(prog)) {
>> + err = PTR_ERR(prog);
>> + goto unlock;
>> + }
>> + }
>> +
>
> With this approach you will end up taking a ref count twice per RQ! on
> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
> One ref will be taken per RQ/Channel from the above code, and since
> reset will be TRUE mlx5e_open_locked will be called and another ref
> count will be taken on mlx5e_create_rq.
>
> The issue here is that we have two places for ref count accounting,
> xdp_set and mlx5e_create_rq. Having ref-count updates in
> mlx5e_create_rq is essential for num_channels configuration changes
> (mlx5e_set_ringparam).
>
> Our previous approach made sure that only one path will do the ref
> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
> reset == false).
That is correct, for a short time bpf_prog_add() was charged also when
we reset. When we reset, we will then jump to unlock_put and safely undo
this since we took ref from mlx5e_create_rq() already in that case, and
return successfully. That was intentional, so that eventually we end up
just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
below ...
[...]
> 2. Keep the current approach and make sure to not update the ref count
> twice, you can batch update only if (!reset && was_open) otherwise you
> can rely on mlx5e_open_locked to take the num_channels ref count for
> you.
>
> Personally I prefer option 2, since we will keep the current logic
> which will allow configuration updates such as (mlx5e_set_ringparam)
> to not worry about ref counting since it will be done in the reset
> flow.
... agree on keeping current approach. I actually like the idea, so we'd
end up with this simpler version for the batched ref then.
Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
since this we already got that one through dev_change_xdp_fd() as mentioned.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ab0c336..09ac27c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3148,11 +3148,21 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
if (was_opened && reset)
mlx5e_close_locked(netdev);
+ if (was_opened && !reset) {
+ /* num_channels is invariant here, so we can take the
+ * batched reference right upfront.
+ */
+ prog = bpf_prog_add(prog, priv->params.num_channels);
+ if (IS_ERR(prog)) {
+ err = PTR_ERR(prog);
+ goto unlock;
+ }
+ }
- /* exchange programs */
+ /* exchange programs, extra prog reference we got from caller
+ * as long as we don't fail from this point onwards.
+ */
old_prog = xchg(&priv->xdp_prog, prog);
- if (prog)
- bpf_prog_add(prog, 1);
if (old_prog)
bpf_prog_put(old_prog);
@@ -3168,7 +3178,6 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
/* exchanging programs w/o reset, we update ref counts on behalf
* of the channels RQs here.
*/
- bpf_prog_add(prog, priv->params.num_channels);
for (i = 0; i < priv->params.num_channels; i++) {
struct mlx5e_channel *c = priv->channel[i];
--
1.9.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
2016-11-16 13:54 ` Daniel Borkmann
@ 2016-11-16 14:30 ` Saeed Mahameed
2016-11-16 15:26 ` Daniel Borkmann
0 siblings, 1 reply; 15+ messages in thread
From: Saeed Mahameed @ 2016-11-16 14:30 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>
>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>> wrote:
>>>
>>> There are multiple issues in mlx5e_xdp_set():
>>>
>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>> doing so, it should be done at an earlier point in time to makes sure
>>> that we cannot fail anymore at the time we want to set the program
>>> for
>>> each channel. This only means that we have to undo the bpf_prog_add()
>>> in case we return due to required reset.
>>>
>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>> be
>>> taken since we got that from call path via dev_change_xdp_fd()
>>> already.
>>> Otherwise, we'd never be able to free the program. Also,
>>> bpf_prog_add()
>>> without checking the return code could fail.
>>>
>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>> ---
>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>> ++++++++++++++++++-----
>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index ab0c336..cf26672 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>> *netdev, struct bpf_prog *prog)
>>> goto unlock;
>>> }
>>>
>>> + if (prog) {
>>> + /* num_channels is invariant here, so we can take the
>>> + * batched reference right upfront.
>>> + */
>>> + prog = bpf_prog_add(prog, priv->params.num_channels);
>>> + if (IS_ERR(prog)) {
>>> + err = PTR_ERR(prog);
>>> + goto unlock;
>>> + }
>>> + }
>>> +
>>
>>
>> With this approach you will end up taking a ref count twice per RQ! on
>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>> One ref will be taken per RQ/Channel from the above code, and since
>> reset will be TRUE mlx5e_open_locked will be called and another ref
>> count will be taken on mlx5e_create_rq.
>>
>> The issue here is that we have two places for ref count accounting,
>> xdp_set and mlx5e_create_rq. Having ref-count updates in
>> mlx5e_create_rq is essential for num_channels configuration changes
>> (mlx5e_set_ringparam).
>>
>> Our previous approach made sure that only one path will do the ref
>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>> reset == false).
>
>
> That is correct, for a short time bpf_prog_add() was charged also when
> we reset. When we reset, we will then jump to unlock_put and safely undo
> this since we took ref from mlx5e_create_rq() already in that case, and
> return successfully. That was intentional, so that eventually we end up
> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
> below ...
>
> [...]
>>
>> 2. Keep the current approach and make sure to not update the ref count
>> twice, you can batch update only if (!reset && was_open) otherwise you
>> can rely on mlx5e_open_locked to take the num_channels ref count for
>> you.
>>
>> Personally I prefer option 2, since we will keep the current logic
>> which will allow configuration updates such as (mlx5e_set_ringparam)
>> to not worry about ref counting since it will be done in the reset
>> flow.
>
>
> ... agree on keeping current approach. I actually like the idea, so we'd
> end up with this simpler version for the batched ref then.
>
Great :).
So let's do the batched update only if we are not going to reset (we
already know that in advance), instead of the current patch where you
batch update unconditionally and then
unlock_put in case reset was performed (which is just redundant and confusing).
Please note that if open_locked fails you need to goto unlock_put.
> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
> since this we already got that one through dev_change_xdp_fd() as mentioned.
>
If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
in your next patch? this doesn't look symmetric (right) !
you shouldn't release a reference you didn't take.
Overall with this series the driver can take num_channels refs and
will release num_channels refs on mlx5e_close. we shouldn't release
one extra ref on NIC cleanup.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
2016-11-16 14:30 ` Saeed Mahameed
@ 2016-11-16 15:26 ` Daniel Borkmann
2016-11-17 9:48 ` Saeed Mahameed
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 15:26 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>>
>>>> There are multiple issues in mlx5e_xdp_set():
>>>>
>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>> doing so, it should be done at an earlier point in time to makes sure
>>>> that we cannot fail anymore at the time we want to set the program
>>>> for
>>>> each channel. This only means that we have to undo the bpf_prog_add()
>>>> in case we return due to required reset.
>>>>
>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>> be
>>>> taken since we got that from call path via dev_change_xdp_fd()
>>>> already.
>>>> Otherwise, we'd never be able to free the program. Also,
>>>> bpf_prog_add()
>>>> without checking the return code could fail.
>>>>
>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>> ---
>>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>> ++++++++++++++++++-----
>>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> index ab0c336..cf26672 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>> *netdev, struct bpf_prog *prog)
>>>> goto unlock;
>>>> }
>>>>
>>>> + if (prog) {
>>>> + /* num_channels is invariant here, so we can take the
>>>> + * batched reference right upfront.
>>>> + */
>>>> + prog = bpf_prog_add(prog, priv->params.num_channels);
>>>> + if (IS_ERR(prog)) {
>>>> + err = PTR_ERR(prog);
>>>> + goto unlock;
>>>> + }
>>>> + }
>>>> +
>>>
>>> With this approach you will end up taking a ref count twice per RQ! on
>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>> One ref will be taken per RQ/Channel from the above code, and since
>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>> count will be taken on mlx5e_create_rq.
>>>
>>> The issue here is that we have two places for ref count accounting,
>>> xdp_set and mlx5e_create_rq. Having ref-count updates in
>>> mlx5e_create_rq is essential for num_channels configuration changes
>>> (mlx5e_set_ringparam).
>>>
>>> Our previous approach made sure that only one path will do the ref
>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>> reset == false).
>>
>> That is correct, for a short time bpf_prog_add() was charged also when
>> we reset. When we reset, we will then jump to unlock_put and safely undo
>> this since we took ref from mlx5e_create_rq() already in that case, and
>> return successfully. That was intentional, so that eventually we end up
>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>> below ...
>>
>> [...]
>>>
>>> 2. Keep the current approach and make sure to not update the ref count
>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>> you.
>>>
>>> Personally I prefer option 2, since we will keep the current logic
>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>> to not worry about ref counting since it will be done in the reset
>>> flow.
>>
>> ... agree on keeping current approach. I actually like the idea, so we'd
>> end up with this simpler version for the batched ref then.
>
> Great :).
>
> So let's do the batched update only if we are not going to reset (we
> already know that in advance), instead of the current patch where you
> batch update unconditionally and then
> unlock_put in case reset was performed (which is just redundant and confusing).
>
> Please note that if open_locked fails you need to goto unlock_put.
Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
the original patch or on the already updated diff I did from my very last
email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
>> since this we already got that one through dev_change_xdp_fd() as mentioned.
>
> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
> in your next patch? this doesn't look symmetric (right) !
> you shouldn't release a reference you didn't take.
> Overall with this series the driver can take num_channels refs and
> will release num_channels refs on mlx5e_close. we shouldn't release
> one extra ref on NIC cleanup.
I already explained this in the commit description; when dev_change_xdp_fd()
is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
__bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
For drivers that implement against this ndo, it means that you need N-1 refs
in addition. Have a look at the other two in-tree users, which do it correctly,
that is mlx4_xdp_set() and nfp_net_xdp_setup().
It's documented here (include/linux/netdevice.h) with ndo_xdp referring to it:
/* These structures hold the attributes of xdp state that are being passed
* to the netdevice through the xdp op.
*/
enum xdp_netdev_command {
/* Set or clear a bpf program used in the earliest stages of packet
* rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
* is responsible for calling bpf_prog_put on any old progs that are
* stored. In case of error, the callee need not release the new prog
* reference, but on success it takes ownership and must bpf_prog_put
* when it is no longer used.
*/
XDP_SETUP_PROG,
[...]
};
I think reason why this is rather confusing would be that initially, it was
just a single prog for all queues, but it was requested along the way to move
prog pointer down to queues instead and let them have their own ref, so that
some time in future individual progs from a subset of the queues can be
exchanged.
Since the way xdp in mlx5 is implemented, you currently have the priv->xdp_prog
as the control prog. That's okay right now, but requires to drop the last ref
on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and priv->xdp_prog
still present.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
2016-11-16 12:51 ` Saeed Mahameed
@ 2016-11-16 15:45 ` Daniel Borkmann
2016-11-16 15:55 ` Daniel Borkmann
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 15:45 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On 11/16/2016 01:51 PM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> mlx5e_xdp_set() is currently the only place where we drop reference on the
>> prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
>> need to make sure that we eventually release that reference, for example,
>> in case the netdev is dismantled.
>>
>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index cf26672..60fe54c 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -3715,6 +3715,9 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
>>
>> if (MLX5_CAP_GEN(mdev, vport_group_manager))
>> mlx5_eswitch_unregister_vport_rep(esw, 0);
>> +
>> + if (priv->xdp_prog)
>> + bpf_prog_put(priv->xdp_prog);
>> }
>
> I thought that on unregister_netdev ndo_xdp_set will be called with
> NULL prog to cleanup. like any other resources (Vlans/mac_lists/
> etc..), why xdp should be different ?
> Anyway if this is the case, I am ok with this fix, you can even send
> it to net (it looks like a serious leak).
The only interaction with ndo_xdp() right now is dev_change_xdp_fd()
and the currently a bit terse dump via rtnl_xdp_fill(). The latter
only tells whether something is actually attached and will have more
info in near future, but doesn't alter anything.
dev_change_xdp_fd() is only triggered from user side via netlink when
IFLA_XDP container attr is around, so no automatic cleanup here. This
means as per documentation in enum xdp_netdev_command, that the driver
has full ownership, thus needs to bpf_prog_put().
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
2016-11-16 15:45 ` Daniel Borkmann
@ 2016-11-16 15:55 ` Daniel Borkmann
2016-11-17 9:34 ` Saeed Mahameed
0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-16 15:55 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On 11/16/2016 04:45 PM, Daniel Borkmann wrote:
> On 11/16/2016 01:51 PM, Saeed Mahameed wrote:
>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>> mlx5e_xdp_set() is currently the only place where we drop reference on the
>>> prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
>>> need to make sure that we eventually release that reference, for example,
>>> in case the netdev is dismantled.
>>>
>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>> ---
>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index cf26672..60fe54c 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -3715,6 +3715,9 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
>>>
>>> if (MLX5_CAP_GEN(mdev, vport_group_manager))
>>> mlx5_eswitch_unregister_vport_rep(esw, 0);
>>> +
>>> + if (priv->xdp_prog)
>>> + bpf_prog_put(priv->xdp_prog);
>>> }
>>
>> I thought that on unregister_netdev ndo_xdp_set will be called with
>> NULL prog to cleanup. like any other resources (Vlans/mac_lists/
>> etc..), why xdp should be different ?
>> Anyway if this is the case, I am ok with this fix, you can even send
>> it to net (it looks like a serious leak).
>
> The only interaction with ndo_xdp() right now is dev_change_xdp_fd()
> and the currently a bit terse dump via rtnl_xdp_fill(). The latter
> only tells whether something is actually attached and will have more
> info in near future, but doesn't alter anything.
>
> dev_change_xdp_fd() is only triggered from user side via netlink when
> IFLA_XDP container attr is around, so no automatic cleanup here. This
> means as per documentation in enum xdp_netdev_command, that the driver
> has full ownership, thus needs to bpf_prog_put().
Note that without patch 2/4, just sending this one to net doesn't really
solve anything, since there the mlx5e_xdp_set() still has the incorrect
bpf_prog_add(prog, 1) around. So it's the whole series if so. I had it
originally targeted at net, but Alexei suggested net-next; I don't really
mind either way, so I agreed to go for net-next.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
2016-11-16 15:55 ` Daniel Borkmann
@ 2016-11-17 9:34 ` Saeed Mahameed
0 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2016-11-17 9:34 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On Wed, Nov 16, 2016 at 5:55 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 11/16/2016 04:45 PM, Daniel Borkmann wrote:
>>
>> On 11/16/2016 01:51 PM, Saeed Mahameed wrote:
>>>
>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>>
>>>> mlx5e_xdp_set() is currently the only place where we drop reference on
>>>> the
>>>> prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
>>>> need to make sure that we eventually release that reference, for
>>>> example,
>>>> in case the netdev is dismantled.
>>>>
>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>> ---
>>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> index cf26672..60fe54c 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> @@ -3715,6 +3715,9 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv
>>>> *priv)
>>>>
>>>> if (MLX5_CAP_GEN(mdev, vport_group_manager))
>>>> mlx5_eswitch_unregister_vport_rep(esw, 0);
>>>> +
>>>> + if (priv->xdp_prog)
>>>> + bpf_prog_put(priv->xdp_prog);
>>>> }
>>>
>>>
>>> I thought that on unregister_netdev ndo_xdp_set will be called with
>>> NULL prog to cleanup. like any other resources (Vlans/mac_lists/
>>> etc..), why xdp should be different ?
>>> Anyway if this is the case, I am ok with this fix, you can even send
>>> it to net (it looks like a serious leak).
>>
>>
>> The only interaction with ndo_xdp() right now is dev_change_xdp_fd()
>> and the currently a bit terse dump via rtnl_xdp_fill(). The latter
>> only tells whether something is actually attached and will have more
>> info in near future, but doesn't alter anything.
>>
>> dev_change_xdp_fd() is only triggered from user side via netlink when
>> IFLA_XDP container attr is around, so no automatic cleanup here. This
>> means as per documentation in enum xdp_netdev_command, that the driver
>> has full ownership, thus needs to bpf_prog_put().
>
>
> Note that without patch 2/4, just sending this one to net doesn't really
> solve anything, since there the mlx5e_xdp_set() still has the incorrect
> bpf_prog_add(prog, 1) around. So it's the whole series if so. I had it
> originally targeted at net, but Alexei suggested net-next; I don't really
> mind either way, so I agreed to go for net-next.
Ok, Thank you Daniel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
2016-11-16 15:26 ` Daniel Borkmann
@ 2016-11-17 9:48 ` Saeed Mahameed
2016-11-17 20:03 ` Daniel Borkmann
0 siblings, 1 reply; 15+ messages in thread
From: Saeed Mahameed @ 2016-11-17 9:48 UTC (permalink / raw)
To: Daniel Borkmann
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, Zhiyi Sun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On Wed, Nov 16, 2016 at 5:26 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
>>
>> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net>
>> wrote:
>>>
>>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>>>
>>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>>> wrote:
>>>>>
>>>>>
>>>>> There are multiple issues in mlx5e_xdp_set():
>>>>>
>>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>>> doing so, it should be done at an earlier point in time to makes
>>>>> sure
>>>>> that we cannot fail anymore at the time we want to set the program
>>>>> for
>>>>> each channel. This only means that we have to undo the
>>>>> bpf_prog_add()
>>>>> in case we return due to required reset.
>>>>>
>>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>>> be
>>>>> taken since we got that from call path via dev_change_xdp_fd()
>>>>> already.
>>>>> Otherwise, we'd never be able to free the program. Also,
>>>>> bpf_prog_add()
>>>>> without checking the return code could fail.
>>>>>
>>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs
>>>>> support")
>>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>> ---
>>>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>>> ++++++++++++++++++-----
>>>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> index ab0c336..cf26672 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>>> *netdev, struct bpf_prog *prog)
>>>>> goto unlock;
>>>>> }
>>>>>
>>>>> + if (prog) {
>>>>> + /* num_channels is invariant here, so we can take the
>>>>> + * batched reference right upfront.
>>>>> + */
>>>>> + prog = bpf_prog_add(prog, priv->params.num_channels);
>>>>> + if (IS_ERR(prog)) {
>>>>> + err = PTR_ERR(prog);
>>>>> + goto unlock;
>>>>> + }
>>>>> + }
>>>>> +
>>>>
>>>>
>>>> With this approach you will end up taking a ref count twice per RQ! on
>>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>>> One ref will be taken per RQ/Channel from the above code, and since
>>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>>> count will be taken on mlx5e_create_rq.
>>>>
>>>> The issue here is that we have two places for ref count accounting,
>>>> xdp_set and mlx5e_create_rq. Having ref-count updates in
>>>> mlx5e_create_rq is essential for num_channels configuration changes
>>>> (mlx5e_set_ringparam).
>>>>
>>>> Our previous approach made sure that only one path will do the ref
>>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>>> reset == false).
>>>
>>>
>>> That is correct, for a short time bpf_prog_add() was charged also when
>>> we reset. When we reset, we will then jump to unlock_put and safely undo
>>> this since we took ref from mlx5e_create_rq() already in that case, and
>>> return successfully. That was intentional, so that eventually we end up
>>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>>> below ...
>>>
>>> [...]
>>>>
>>>>
>>>> 2. Keep the current approach and make sure to not update the ref count
>>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>>> you.
>>>>
>>>> Personally I prefer option 2, since we will keep the current logic
>>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>>> to not worry about ref counting since it will be done in the reset
>>>> flow.
>>>
>>>
>>> ... agree on keeping current approach. I actually like the idea, so we'd
>>> end up with this simpler version for the batched ref then.
>>
>>
>> Great :).
>>
>> So let's do the batched update only if we are not going to reset (we
>> already know that in advance), instead of the current patch where you
>> batch update unconditionally and then
>> unlock_put in case reset was performed (which is just redundant and
>> confusing).
>>
>> Please note that if open_locked fails you need to goto unlock_put.
>
>
> Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
> the original patch or on the already updated diff I did from my very last
> email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>
I am commenting on this patch "V2 2/4".
this patch will take batched ref count unconditionally and release the
extra refs "unlock_put" if reset was performed.
I am asking to take the batched ref only if we are not going to reset
in advance to save the extra "unlock_put"
>>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not
>>> needed
>>> since this we already got that one through dev_change_xdp_fd() as
>>> mentioned.
>>
>>
>> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
>> in your next patch? this doesn't look symmetric (right) !
>> you shouldn't release a reference you didn't take.
>> Overall with this series the driver can take num_channels refs and
>> will release num_channels refs on mlx5e_close. we shouldn't release
>> one extra ref on NIC cleanup.
>
>
> I already explained this in the commit description; when dev_change_xdp_fd()
> is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
> __bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
> passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
> callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
>
> For drivers that implement against this ndo, it means that you need N-1 refs
> in addition. Have a look at the other two in-tree users, which do it
> correctly,
> that is mlx4_xdp_set() and nfp_net_xdp_setup().
>
> It's documented here (include/linux/netdevice.h) with ndo_xdp referring to
> it:
>
> /* These structures hold the attributes of xdp state that are being passed
> * to the netdevice through the xdp op.
> */
> enum xdp_netdev_command {
> /* Set or clear a bpf program used in the earliest stages of packet
> * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The
> callee
> * is responsible for calling bpf_prog_put on any old progs that are
> * stored. In case of error, the callee need not release the new
> prog
> * reference, but on success it takes ownership and must
> bpf_prog_put
> * when it is no longer used.
> */
> XDP_SETUP_PROG,
> [...]
> };
>
> I think reason why this is rather confusing would be that initially, it was
> just a single prog for all queues, but it was requested along the way to
> move
> prog pointer down to queues instead and let them have their own ref, so that
> some time in future individual progs from a subset of the queues can be
> exchanged.
>
> Since the way xdp in mlx5 is implemented, you currently have the
> priv->xdp_prog
> as the control prog. That's okay right now, but requires to drop the last
> ref
> on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and
> priv->xdp_prog
> still present.
Got it, but i would rather make the stack cleanup those refs once it
receives an unregester_netdev event instead of counting on netdev to
do it, as i said before, like any other resource added to the netdev
by the stack (vlan/mac/etc..).
but this is a general discussion of xdp API, it won't block this series :).
Saeed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
2016-11-17 9:48 ` Saeed Mahameed
@ 2016-11-17 20:03 ` Daniel Borkmann
0 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2016-11-17 20:03 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, Zhiyi Sun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
On 11/17/2016 10:48 AM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 5:26 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
>>> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>>>> wrote:
>>>>>>
>>>>>> There are multiple issues in mlx5e_xdp_set():
>>>>>>
>>>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>>>> doing so, it should be done at an earlier point in time to makes
>>>>>> sure
>>>>>> that we cannot fail anymore at the time we want to set the program
>>>>>> for
>>>>>> each channel. This only means that we have to undo the
>>>>>> bpf_prog_add()
>>>>>> in case we return due to required reset.
>>>>>>
>>>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>>>> be
>>>>>> taken since we got that from call path via dev_change_xdp_fd()
>>>>>> already.
>>>>>> Otherwise, we'd never be able to free the program. Also,
>>>>>> bpf_prog_add()
>>>>>> without checking the return code could fail.
>>>>>>
>>>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs
>>>>>> support")
>>>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>>> ---
>>>>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>>>> ++++++++++++++++++-----
>>>>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> index ab0c336..cf26672 100644
>>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>>>> *netdev, struct bpf_prog *prog)
>>>>>> goto unlock;
>>>>>> }
>>>>>>
>>>>>> + if (prog) {
>>>>>> + /* num_channels is invariant here, so we can take the
>>>>>> + * batched reference right upfront.
>>>>>> + */
>>>>>> + prog = bpf_prog_add(prog, priv->params.num_channels);
>>>>>> + if (IS_ERR(prog)) {
>>>>>> + err = PTR_ERR(prog);
>>>>>> + goto unlock;
>>>>>> + }
>>>>>> + }
>>>>>> +
>>>>>
>>>>> With this approach you will end up taking a ref count twice per RQ! on
>>>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>>>> One ref will be taken per RQ/Channel from the above code, and since
>>>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>>>> count will be taken on mlx5e_create_rq.
>>>>>
>>>>> The issue here is that we have two places for ref count accounting,
>>>>> xdp_set and mlx5e_create_rq. Having ref-count updates in
>>>>> mlx5e_create_rq is essential for num_channels configuration changes
>>>>> (mlx5e_set_ringparam).
>>>>>
>>>>> Our previous approach made sure that only one path will do the ref
>>>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>>>> reset == false).
>>>>
>>>> That is correct, for a short time bpf_prog_add() was charged also when
>>>> we reset. When we reset, we will then jump to unlock_put and safely undo
>>>> this since we took ref from mlx5e_create_rq() already in that case, and
>>>> return successfully. That was intentional, so that eventually we end up
>>>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>>>> below ...
>>>>
>>>> [...]
>>>>>
>>>>> 2. Keep the current approach and make sure to not update the ref count
>>>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>>>> you.
>>>>>
>>>>> Personally I prefer option 2, since we will keep the current logic
>>>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>>>> to not worry about ref counting since it will be done in the reset
>>>>> flow.
>>>>
>>>> ... agree on keeping current approach. I actually like the idea, so we'd
>>>> end up with this simpler version for the batched ref then.
>>>
>>> Great :).
>>>
>>> So let's do the batched update only if we are not going to reset (we
>>> already know that in advance), instead of the current patch where you
>>> batch update unconditionally and then
>>> unlock_put in case reset was performed (which is just redundant and
>>> confusing).
>>>
>>> Please note that if open_locked fails you need to goto unlock_put.
>>
>> Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
>> the original patch or on the already updated diff I did from my very last
>> email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>
> I am commenting on this patch "V2 2/4".
> this patch will take batched ref count unconditionally and release the
> extra refs "unlock_put" if reset was performed.
> I am asking to take the batched ref only if we are not going to reset
> in advance to save the extra "unlock_put"
Okay, sure, it's clear and we discussed about that already earlier, and
my response back then was the diff in the /end/ of this email here:
https://www.spinics.net/lists/netdev/msg404709.html
Hence my confusion why we were back on the original patch after that. ;)
Anyway, I'll get the updated set with the suggestion out tomorrow, thanks
for the review!
>>>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not
>>>> needed
>>>> since this we already got that one through dev_change_xdp_fd() as
>>>> mentioned.
>>>
>>> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
>>> in your next patch? this doesn't look symmetric (right) !
>>> you shouldn't release a reference you didn't take.
>>> Overall with this series the driver can take num_channels refs and
>>> will release num_channels refs on mlx5e_close. we shouldn't release
>>> one extra ref on NIC cleanup.
>>
>> I already explained this in the commit description; when dev_change_xdp_fd()
>> is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
>> __bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
>> passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
>> callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
>>
>> For drivers that implement against this ndo, it means that you need N-1 refs
>> in addition. Have a look at the other two in-tree users, which do it
>> correctly,
>> that is mlx4_xdp_set() and nfp_net_xdp_setup().
>>
>> It's documented here (include/linux/netdevice.h) with ndo_xdp referring to
>> it:
>>
>> /* These structures hold the attributes of xdp state that are being passed
>> * to the netdevice through the xdp op.
>> */
>> enum xdp_netdev_command {
>> /* Set or clear a bpf program used in the earliest stages of packet
>> * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The
>> callee
>> * is responsible for calling bpf_prog_put on any old progs that are
>> * stored. In case of error, the callee need not release the new
>> prog
>> * reference, but on success it takes ownership and must
>> bpf_prog_put
>> * when it is no longer used.
>> */
>> XDP_SETUP_PROG,
>> [...]
>> };
>>
>> I think reason why this is rather confusing would be that initially, it was
>> just a single prog for all queues, but it was requested along the way to
>> move
>> prog pointer down to queues instead and let them have their own ref, so that
>> some time in future individual progs from a subset of the queues can be
>> exchanged.
>>
>> Since the way xdp in mlx5 is implemented, you currently have the
>> priv->xdp_prog
>> as the control prog. That's okay right now, but requires to drop the last
>> ref
>> on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and
>> priv->xdp_prog
>> still present.
>
> Got it, but i would rather make the stack cleanup those refs once it
> receives an unregester_netdev event instead of counting on netdev to
> do it, as i said before, like any other resource added to the netdev
> by the stack (vlan/mac/etc..).
>
> but this is a general discussion of xdp API, it won't block this series :).
>
> Saeed.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-11-17 20:03 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-16 0:04 [PATCH net-next v2 0/4] Couple of BPF refcount fixes for mlx5 Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 1/4] bpf, mlx5: fix mlx5e_create_rq taking reference on prog Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set Daniel Borkmann
2016-11-16 12:25 ` Saeed Mahameed
2016-11-16 13:54 ` Daniel Borkmann
2016-11-16 14:30 ` Saeed Mahameed
2016-11-16 15:26 ` Daniel Borkmann
2016-11-17 9:48 ` Saeed Mahameed
2016-11-17 20:03 ` Daniel Borkmann
2016-11-16 0:04 ` [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup Daniel Borkmann
2016-11-16 12:51 ` Saeed Mahameed
2016-11-16 15:45 ` Daniel Borkmann
2016-11-16 15:55 ` Daniel Borkmann
2016-11-17 9:34 ` Saeed Mahameed
2016-11-16 0:04 ` [PATCH net-next v2 4/4] bpf: add __must_check attributes to refcount manipulating helpers Daniel Borkmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).