Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()
From: Taehee Yoo @ 2020-06-16 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: Jakub Kicinski, Netdev, xeb
In-Reply-To: <20200615.181701.1458193855639723291.davem@davemloft.net>

On Tue, 16 Jun 2020 at 10:17, David Miller <davem@davemloft.net> wrote:
>

Hi David,
Thank you for the review :)

> From: Taehee Yoo <ap420073@gmail.com>
> Date: Mon, 15 Jun 2020 15:07:51 +0000
>
> > In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses
> > fallback tunnel device pointer, which is fb_tunnel_dev.
> > This pointer is protected by RTNL. It's not enough to be used
> > in the datapath.
> > So, this pointer would be used after an interface is deleted.
> > It eventually results in the use-after-free problem.
> >
> > In order to avoid the problem, the new tunnel pointer variable is added,
> > which indicates a fallback tunnel device's tunnel pointer.
> > This is protected by both RTNL and RCU.
> > So, it's safe to be used in the datapath.
>  ...
>
> I'm marking this changes requested because it seems like the feedback Eric
> Dumazet provided for the ip_tunnel version of this fix applies here too.
>
> Thank you.

I will send a v2 patch soon.

Thanks a lot!
Taehee Yoo

^ permalink raw reply

* [PATCH bpf] devmap: use bpf_map_area_alloc() for allocating hash buckets
From: Toke Høiland-Jørgensen @ 2020-06-16 14:28 UTC (permalink / raw)
  To: daniel, ast; +Cc: Toke Høiland-Jørgensen, bpf, netdev, Xiumei Mu

Syzkaller discovered that creating a hash of type devmap_hash with a large
number of entries can hit the memory allocator limit for allocating
contiguous memory regions. There's really no reason to use kmalloc_array()
directly in the devmap code, so just switch it to the existing
bpf_map_area_alloc() function that is used elsewhere.

Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 kernel/bpf/devmap.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 0cbb72cdaf63..5fdbc776a760 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -86,12 +86,13 @@ static DEFINE_PER_CPU(struct list_head, dev_flush_list);
 static DEFINE_SPINLOCK(dev_map_lock);
 static LIST_HEAD(dev_map_list);
 
-static struct hlist_head *dev_map_create_hash(unsigned int entries)
+static struct hlist_head *dev_map_create_hash(unsigned int entries,
+					      int numa_node)
 {
 	int i;
 	struct hlist_head *hash;
 
-	hash = kmalloc_array(entries, sizeof(*hash), GFP_KERNEL);
+	hash = bpf_map_area_alloc(entries * sizeof(*hash), numa_node);
 	if (hash != NULL)
 		for (i = 0; i < entries; i++)
 			INIT_HLIST_HEAD(&hash[i]);
@@ -145,7 +146,8 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 		return -EINVAL;
 
 	if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
-		dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets);
+		dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets,
+							   dtab->map.numa_node);
 		if (!dtab->dev_index_head)
 			goto free_charge;
 
@@ -232,7 +234,7 @@ static void dev_map_free(struct bpf_map *map)
 			}
 		}
 
-		kfree(dtab->dev_index_head);
+		bpf_map_area_free(dtab->dev_index_head);
 	} else {
 		for (i = 0; i < dtab->map.max_entries; i++) {
 			struct bpf_dtab_netdev *dev;
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH v4 2/3] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Dan Carpenter @ 2020-06-16 14:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, David Howells, Jarkko Sakkinen, James Morris,
	Serge E. Hallyn, Linus Torvalds, Joe Perches, Matthew Wilcox,
	David Rientjes, Michal Hocko, Johannes Weiner, David Sterba,
	Jason A . Donenfeld, linux-mm, keyrings, linux-kernel,
	linux-crypto, linux-pm, linux-stm32, linux-amlogic,
	linux-mediatek, linuxppc-dev, virtualization, netdev, linux-ppp,
	wireguard, linux-wireless, devel, linux-scsi, target-devel,
	linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs, kasan-dev,
	linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity
In-Reply-To: <20200616015718.7812-3-longman@redhat.com>

Last time you sent this we couldn't decide which tree it should go
through.  Either the crypto tree or through Andrew seems like the right
thing to me.

Also the other issue is that it risks breaking things if people add
new kzfree() instances while we are doing the transition.  Could you
just add a "#define kzfree kfree_sensitive" so that things continue to
compile and we can remove it in the next kernel release?

regards,
dan carpenter


^ permalink raw reply

* Re: [PATCH net] ip_tunnel: fix use-after-free in ip_tunnel_lookup()
From: Taehee Yoo @ 2020-06-16 14:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, Netdev, pshelar
In-Reply-To: <e879112d-3285-d6d8-457b-2ae2f8d38aaf@gmail.com>

On Tue, 16 Jun 2020 at 01:02, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>

Hi Eric,
Thank you for the review!

>
> On 6/15/20 8:06 AM, Taehee Yoo wrote:
> > In the datapath, the ip_tunnel_lookup() is used and it internally uses
> > fallback tunnel device pointer, which is fb_tunnel_dev.
> > This pointer is protected by RTNL. It's not enough to be used
> > in the datapath.
> > So, this pointer would be used after an interface is deleted.
> > It eventually results in the use-after-free problem.
> >
> > In order to avoid the problem, the new tunnel pointer variable is added,
> > which indicates a fallback tunnel device's tunnel pointer.
> > This is protected by both RTNL and RCU.
> > So, it's safe to be used in the datapath.
> >
> > Test commands:
> >     ip netns add A
> >     ip netns add B
> >     ip link add eth0 type veth peer name eth1
> >     ip link set eth0 netns A
> >     ip link set eth1 netns B
> >
> >     ip netns exec A ip link set lo up
> >     ip netns exec A ip link set eth0 up
> >     ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
> >           remote 10.0.0.2
> >     ip netns exec A ip link set gre1 up
> >     ip netns exec A ip a a 10.0.100.1/24 dev gre1
> >     ip netns exec A ip a a 10.0.0.1/24 dev eth0
> >
> >     ip netns exec B ip link set lo up
> >     ip netns exec B ip link set eth1 up
> >     ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
> >           remote 10.0.0.1
> >     ip netns exec B ip link set gre1 up
> >     ip netns exec B ip a a 10.0.100.2/24 dev gre1
> >     ip netns exec B ip a a 10.0.0.2/24 dev eth1
> >     ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
> >     ip netns del B
> >
> > Splat looks like:
> > [  133.319668][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0x9d6/0xde0
> > [  133.343852][    C3] Read of size 4 at addr ffff8880b1701c84 by task hping3/1222
> > [  133.344724][    C3]
> > [  133.345002][    C3] CPU: 3 PID: 1222 Comm: hping3 Not tainted 5.7.0+ #591
> > [  133.345814][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> > [  133.373336][    C3] Call Trace:
> > [  133.374792][    C3]  <IRQ>
> > [  133.375205][    C3]  dump_stack+0x96/0xdb
> > [  133.375789][    C3]  print_address_description.constprop.6+0x2cc/0x450
> > [  133.376720][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.377431][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.378130][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.378851][    C3]  kasan_report+0x154/0x190
> > [  133.379494][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.380200][    C3]  ip_tunnel_lookup+0x9d6/0xde0
> > [  133.380894][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
> > [  133.381630][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
> > [  133.382429][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
> > [ ... ]
> >
> > Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >  include/net/ip_tunnels.h |  1 +
> >  net/ipv4/ip_tunnel.c     | 11 ++++++++---
> >  2 files changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> > index 076e5d7db7d3..7442c517bb75 100644
> > --- a/include/net/ip_tunnels.h
> > +++ b/include/net/ip_tunnels.h
> > @@ -164,6 +164,7 @@ struct ip_tunnel_net {
> >       struct rtnl_link_ops *rtnl_link_ops;
> >       struct hlist_head tunnels[IP_TNL_HASH_SIZE];
> >       struct ip_tunnel __rcu *collect_md_tun;
> > +     struct ip_tunnel __rcu *fb_tun;
> >       int type;
> >  };
> >
> > diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> > index f4f1d11eab50..285b863e2fcc 100644
> > --- a/net/ipv4/ip_tunnel.c
> > +++ b/net/ipv4/ip_tunnel.c
> > @@ -162,8 +162,9 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
> >       if (t && t->dev->flags & IFF_UP)
> >               return t;
> >
> > -     if (itn->fb_tunnel_dev && itn->fb_tunnel_dev->flags & IFF_UP)
> > -             return netdev_priv(itn->fb_tunnel_dev);
> > +     t = rcu_dereference(itn->fb_tun);
> > +     if (t && t->dev->flags & IFF_UP)
> > +             return t;
> >
>
> There is no need for a new variable.
>
> Your patch does not add any new rcu grace period, so it seems obvious that you
> relied on existing grace periods.
>
> The real question is why ip_tunnel_uninit() does not clear itn->fb_tunnel_dev
>
> And of course why ip_tunnel_lookup() does not a READ_ONCE()
>
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index f4f1d11eab502290f9d74e2c8aafd69bceb58763..2416aa33d3645e1da967ec4c914564c5727a4d80 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -87,6 +87,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
>  {
>         unsigned int hash;
>         struct ip_tunnel *t, *cand = NULL;
> +       struct net_device *ndev;
>         struct hlist_head *head;
>
>         hash = ip_tunnel_hash(key, remote);
> @@ -162,8 +163,9 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
>         if (t && t->dev->flags & IFF_UP)
>                 return t;
>
> -       if (itn->fb_tunnel_dev && itn->fb_tunnel_dev->flags & IFF_UP)
> -               return netdev_priv(itn->fb_tunnel_dev);
> +       ndev = READ_ONCE(itn->fb_tunnel_dev);
> +       if (ndev && ndev->flags & IFF_UP)
> +               return netdev_priv(ndev);
>
>         return NULL;
>  }
>

Thank you for the suggestion.
I tested this approach and it works well,
So, I will send a v2 patch soon.

Thanks a lot!
Taehee Yoo

^ permalink raw reply

* Re: [PATCH net v2 2/2] net/sched: act_gate: fix configuration of the periodic timer
From: Vladimir Oltean @ 2020-06-16 14:23 UTC (permalink / raw)
  To: Davide Caratti; +Cc: Po Liu, Cong Wang, David S . Miller, netdev
In-Reply-To: <429bc64106ac69c8291f4466ddbaa2b48b8e16c4.camel@redhat.com>

On Tue, 16 Jun 2020 at 15:43, Davide Caratti <dcaratti@redhat.com> wrote:
>
> On Tue, 2020-06-16 at 13:38 +0300, Vladimir Oltean wrote:
> > Hi Davide,
> >
> > On Tue, 16 Jun 2020 at 13:13, Davide Caratti <dcaratti@redhat.com> wrote:
> > > hello Vladimir,
> > >
> > > thanks a lot for reviewing this.
> > >
> > > On Tue, 2020-06-16 at 00:55 +0300, Vladimir Oltean wrote:
>
> [...]
>
> > > > What if you split the "replace" functionality of gate_setup_timer into
> > > > a separate gate_cancel_timer function, which you could call earlier
> > > > (before taking the spin lock)?
> > >
> > > I think it would introduce the following 2 problems:
> > >
> > > problem #1) a race condition, see below:
>
> [...]
>
> > > > @@ -433,6 +448,11 @@ static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > > > >         if (goto_ch)
> > > > >                 tcf_chain_put_by_act(goto_ch);
> > > > >  release_idr:
> > > > > +       /* action is not in: hitimer can be inited without taking tcf_lock */
> > > > > +       if (ret == ACT_P_CREATED)
> > > > > +               gate_setup_timer(gact, gact->param.tcfg_basetime,
> > > > > +                                gact->tk_offset, gact->param.tcfg_clockid,
> > > > > +                                true);
> > >
> > > please note, here I felt the need to add a comment, because when ret ==
> > > ACT_P_CREATED the action is not inserted in any list, so there is no
> > > concurrent writer of gact-> members for that action.
> > >
> >
> > Then please rephrase the comment. I had read it and it still wasn't
> > clear at all for me what you were talking about.
>
> something like:
>
> /* action is not yet inserted in any list: it's safe to init hitimer
>  * without taking tcf_lock.
>  */
>
> would be ok?
>

Yes, better.

> [...]
>
> > I wonder, could you call tcf_gate_cleanup instead of just canceling the
> > hrtimer?
>
> not with the current tcf_gate_cleanup() [1] and parse_gate_list() [2],
> because it would introduce another bug: 'p->entries' gets cleared on
> action overwrite after being successfully created here:
>
> 395         if (tb[TCA_GATE_ENTRY_LIST]) {
> 396                 err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
> 397                 if (err < 0)
> 398                         goto chain_put;
> 399         }
>
>
> like mentioned earlier, 'hitimer' can not be canceled/re-initialized easily when
> tcf_gate_init() still has a possible error path. And in my understanding
> 'p->entries' must be consistent when the timer is initialized.
>
> IMO, the correct way to handle 'entries' is to:
>
> - populate the list on a local variable, before taking the spinlock and
> allocating the IDR
>
> - assign to p->entries after validation is successful (with the spinlock
> taken). Same as what was done with 'cycletime' in patch 1/2, but with the
> variable initialized (btw, thanks for catching this), and free the old
> list in case of action replace
>
> - release the newly allocated list in the error path of tcf_gate_init()
>
> (but again, this would be a fix for 'entries' - not for 'hitimer', so I
> plan to work on it as a separate patch, that fits better 'net-next' rather
> than 'net').
>

Targeting net-next would mean that the net tree would still keep
appending to p->entries upon action replacement, instead of just
replacing p->entries?

> --
> davide
>
> [1] https://elixir.bootlin.com/linux/v5.8-rc1/source/net/sched/act_gate.c#L450
> [2] https://elixir.bootlin.com/linux/v5.8-rc1/source/net/sched/act_gate.c#L235
>

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH net v3 2/4] flow_offload: fix incorrect cb_priv check for flow_block_cb
From: wenxu @ 2020-06-16 14:20 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, davem, pablo, vladbu
In-Reply-To: <20200616105123.GA21396@netronome.com>


在 2020/6/16 18:51, Simon Horman 写道:
> On Tue, Jun 16, 2020 at 11:19:38AM +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> In the function __flow_block_indr_cleanup, The match stataments
>> this->cb_priv == cb_priv is always false, the flow_block_cb->cb_priv
>> is totally different data with the flow_indr_dev->cb_priv.
>>
>> Store the representor cb_priv to the flow_block_cb->indr.cb_priv in
>> the driver.
>>
>> Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
> Hi Wenxu,
>
> I wonder if this can be resolved by using the cb_ident field of struct
> flow_block_cb.
>
> I observe that mlx5e_rep_indr_setup_block() seems to be the only call-site
> where the value of the cb_ident parameter of flow_block_cb_alloc() is
> per-block rather than per-device. So part of my proposal is to change
> that.

I check all the xxdriver_indr_setup_block. It seems all the cb_ident parameter of

flow_block_cb_alloc is per-block. Both in the nfp_flower_setup_indr_tc_block

and bnxt_tc_setup_indr_block.


nfp_flower_setup_indr_tc_block:

struct nfp_flower_indr_block_cb_priv *cb_priv;

block_cb = flow_block_cb_alloc(nfp_flower_setup_indr_block_cb,
                                               cb_priv, cb_priv,
                                               nfp_flower_setup_indr_tc_release);


bnxt_tc_setup_indr_block:

struct bnxt_flower_indr_block_cb_priv *cb_priv;

block_cb = flow_block_cb_alloc(bnxt_tc_setup_indr_block_cb,
                                               cb_priv, cb_priv,
                                               bnxt_tc_setup_indr_rel);


And the function flow_block_cb_is_busy called in most place. Pass the

parameter as cb_priv but not cb_indent .







>
> The other part of my proposal is to make use of cb_ident in
> __flow_block_indr_cleanup(). Which does seem to match the intended
> purpose of cb_ident. Perhaps it would also be good to document what
> the intended purpose of cb_ident (and the other fields of struct
> flow_block_cb) is.
>
> Compile tested only.
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> index a62bcf0cf512..4de6fcae5252 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
> @@ -438,7 +438,7 @@ mlx5e_rep_indr_setup_block(struct net_device *netdev,
>  		list_add(&indr_priv->list,
>  			 &rpriv->uplink_priv.tc_indr_block_priv_list);
>  
> -		block_cb = flow_block_cb_alloc(setup_cb, indr_priv, indr_priv,
> +		block_cb = flow_block_cb_alloc(setup_cb, rpriv, indr_priv,
>  					       mlx5e_rep_indr_block_unbind);
>  		if (IS_ERR(block_cb)) {
>  			list_del(&indr_priv->list);
> diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
> index b288d2f03789..d281fb182894 100644
> --- a/net/core/flow_offload.c
> +++ b/net/core/flow_offload.c
> @@ -373,14 +373,13 @@ int flow_indr_dev_register(flow_indr_block_bind_cb_t *cb, void *cb_priv)
>  EXPORT_SYMBOL(flow_indr_dev_register);
>  
>  static void __flow_block_indr_cleanup(void (*release)(void *cb_priv),
> -				      void *cb_priv,
> +				      void *cb_ident,
>  				      struct list_head *cleanup_list)
>  {
>  	struct flow_block_cb *this, *next;
>  
>  	list_for_each_entry_safe(this, next, &flow_block_indr_list, indr.list) {
> -		if (this->release == release &&
> -		    this->cb_priv == cb_priv) {
> +		if (this->release == release && this->cb_ident == cb_ident) {
>  			list_move(&this->indr.list, cleanup_list);
>  			return;
>  		}
>

^ permalink raw reply

* RATE not being printed on tc -s class show dev XXXX
From: Roberto J. Blandino Cisneros @ 2020-06-16 14:14 UTC (permalink / raw)
  To: netdev

Good Morning.

I am using debian buster 10.4 with iproute2 compile version 5.7.0.

I am testing Traffic Control but on the statistics no Rate value is shown.

For example in the following link "https://paste.ubuntu.com/10963208/", 
i see following output:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
jjo@jjo-x220:~/tmp$ ./iproute2-3.19.0-jjo/tc/tc -s class show dev eth0 
class htb 1:10 parent 1:1 prio 0 rate 2500Mbit ceil 2500Mbit burst 
15000b cburst 1250b Sent 699143 bytes 5790 pkt (dropped 0, overlimits 0 
requeues 0) EST32 rate 0bit 0pps backlog 0b 0p requeues 0 lended: 5733 
borrowed: 0 giants: 0 tokens: 761 ctokens: 74
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I am seing "rate 0bit".

But installing from debian package iproute2 i got nothing so i decide to 
compile iproute2 using version 5.7.0

But my output is the same as below:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# tc -s class show dev enp4s0 | grep 1:30 -A 4
class htb 1:30 parent 1:1 leaf 30: prio 1 rate 3Mbit ceil 3Mbit burst 
5Kb cburst 1599b
  Sent 27793441 bytes 53351 pkt (dropped 26, overlimits 13707 requeues 0)
  backlog 0b 0p requeues 0
  lended: 52095 borrowed: 0 giants: 0
  tokens: 209078 ctokens: 62406
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I did following changes in tc/tc_util.c for printing the rate enclosure 
after and before condicion of printing rate:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
843a844
 > printf("->");
865a867
 > printf("<-");
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The section of tc_util.c show like this :

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
printf("->");
         if (tbs[TCA_STATS_RATE_EST64]) {
struct gnet_stats_rate_est64 re = {0};

memcpy(&re, RTA_DATA(tbs[TCA_STATS_RATE_EST64]),
MIN(RTA_PAYLOAD(tbs[TCA_STATS_RATE_EST64]),
                            sizeof(re)));
print_string(PRINT_FP, NULL, "\n%s", prefix);
print_lluint(PRINT_JSON, "rate", NULL, re.bps);
print_string(PRINT_FP, NULL, "rate %s",
                              sprint_rate(re.bps, b1));
print_lluint(PRINT_ANY, "pps", " %llupps", re.pps);
         } else if (tbs[TCA_STATS_RATE_EST]) {
struct gnet_stats_rate_est re = {0};

memcpy(&re, RTA_DATA(tbs[TCA_STATS_RATE_EST]),
                        MIN(RTA_PAYLOAD(tbs[TCA_STATS_RATE_EST]), 
sizeof(re)));
print_string(PRINT_FP, NULL, "\n%s", prefix);
print_uint(PRINT_JSON, "rate", NULL, re.bps);
print_string(PRINT_FP, NULL, "rate %s",
                              sprint_rate(re.bps, b1));
print_uint(PRINT_ANY, "pps", " %upps", re.pps);
         }
printf("<-");
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And as you can see no rate is being printed, does int need to be enable 
something else to print the rate?, now with the changes done i Got:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# tc -s class show dev enp4s0 | grep 1:30 -A 4
class htb 1:30 parent 1:1 leaf 30: prio 1 rate 3Mbit ceil 3Mbit burst 
5Kb cburst 1599b
  Sent 35713116 bytes 72643 pkt (dropped 26, overlimits 16464 requeues 
0) -><-
  backlog 0b 0p requeues 0
  lended: 71083 borrowed: 0 giants: 0
  tokens: 209078 ctokens: 62406
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As you can see "-><-" is being printed.

Is necessary to do something else to get tbs[TCA_STATS_RATE_EST] working?

Even i compile the last kernel 5.7.2.

Any guide option or value that must need the processor, network card, 
hardware, etc for this?



^ permalink raw reply

* RATE not being printed on tc -s class show dev XXXX
From: Roberto J. Blandino Cisneros @ 2020-06-16 14:05 UTC (permalink / raw)
  To: netdev

Good Morning.

I am using debian buster 10.4 with iproute2 compile version 5.7.0.

I am testing Traffic Control but on the statistics no Rate value is shown.

For example in the following link "https://paste.ubuntu.com/10963208/", 
i see following output:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
jjo@jjo-x220:~/tmp$ ./iproute2-3.19.0-jjo/tc/tc -s class show dev eth0 
class htb 1:10 parent 1:1 prio 0 rate 2500Mbit ceil 2500Mbit burst 
15000b cburst 1250b Sent 699143 bytes 5790 pkt (dropped 0, overlimits 0 
requeues 0) EST32 rate 0bit 0pps backlog 0b 0p requeues 0 lended: 5733 
borrowed: 0 giants: 0 tokens: 761 ctokens: 74
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I am seing "rate 0bit".

But installing from debian package iproute2 i got nothing so i decide to 
compile iproute2 using version 5.7.0

But my output is the same as below:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# tc -s class show dev enp4s0 | grep 1:30 -A 4
class htb 1:30 parent 1:1 leaf 30: prio 1 rate 3Mbit ceil 3Mbit burst 
5Kb cburst 1599b
  Sent 27793441 bytes 53351 pkt (dropped 26, overlimits 13707 requeues 0)
  backlog 0b 0p requeues 0
  lended: 52095 borrowed: 0 giants: 0
  tokens: 209078 ctokens: 62406
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I did following changes in tc/tc_util.c for printing the rate enclosure 
after and before condicion of printing rate:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
843a844
 > printf("->");
865a867
 > printf("<-");
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The section of tc_util.c show like this :

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
printf("->");
         if (tbs[TCA_STATS_RATE_EST64]) {
struct gnet_stats_rate_est64 re = {0};

memcpy(&re, RTA_DATA(tbs[TCA_STATS_RATE_EST64]),
MIN(RTA_PAYLOAD(tbs[TCA_STATS_RATE_EST64]),
                            sizeof(re)));
print_string(PRINT_FP, NULL, "\n%s", prefix);
print_lluint(PRINT_JSON, "rate", NULL, re.bps);
print_string(PRINT_FP, NULL, "rate %s",
                              sprint_rate(re.bps, b1));
print_lluint(PRINT_ANY, "pps", " %llupps", re.pps);
         } else if (tbs[TCA_STATS_RATE_EST]) {
struct gnet_stats_rate_est re = {0};

memcpy(&re, RTA_DATA(tbs[TCA_STATS_RATE_EST]),
                        MIN(RTA_PAYLOAD(tbs[TCA_STATS_RATE_EST]), 
sizeof(re)));
print_string(PRINT_FP, NULL, "\n%s", prefix);
print_uint(PRINT_JSON, "rate", NULL, re.bps);
print_string(PRINT_FP, NULL, "rate %s",
                              sprint_rate(re.bps, b1));
print_uint(PRINT_ANY, "pps", " %upps", re.pps);
         }
printf("<-");
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And as you can see no rate is being printed, does int need to be enable 
something else to print the rate?, now with the changes done i Got:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# tc -s class show dev enp4s0 | grep 1:30 -A 4
class htb 1:30 parent 1:1 leaf 30: prio 1 rate 3Mbit ceil 3Mbit burst 
5Kb cburst 1599b
  Sent 35713116 bytes 72643 pkt (dropped 26, overlimits 16464 requeues 
0) -><-
  backlog 0b 0p requeues 0
  lended: 71083 borrowed: 0 giants: 0
  tokens: 209078 ctokens: 62406
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As you can see "-><-" is being printed.

Is necessary to do something else to get tbs[TCA_STATS_RATE_EST] working?

Even i compile the last kernel 5.7.2.

Any guide option or value that must need the processor, network card, 
hardware, etc for this?


-- 

*Roberto Blandino*

NOC

IBW Nicaragua

_roberto.blandino@ibw.com.ni_

Tel: (505)  2278-6328 Ext-5315



^ permalink raw reply

* Re: [PATCH ipsec] xfrm: policy: match with both mark and mask on user interfaces
From: Xin Long @ 2020-06-16 14:04 UTC (permalink / raw)
  To: Tobias Brunner
  Cc: network dev, Steffen Klassert, Herbert Xu, David S. Miller,
	Jamal Hadi Salim, Sabrina Dubroca
In-Reply-To: <b1656226-4caf-466e-8175-48431752286d@strongswan.org>

On Mon, Jun 15, 2020 at 5:56 PM Tobias Brunner <tobias@strongswan.org> wrote:
>
> Hi Xin,
>
> > To fix this duplicated policies issue, and also fix the issue in
> > commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
> > when doing add/del/get/update on user interfaces, this patch is to change
> > to look up a policy with both mark and mask by doing:
> >
> >   mark.v == pol->mark.v && mark.m == pol->mark.m
>
> Looks good, thanks a lot for your work on this.  All tests in our
> regression test suite complete successfully with this patch applied.
>
> Tested-by: Tobias Brunner <tobias@strongswan.org>
>
> > and leave the check:
> >
> >   ((mark.v & mark.m) & pol->mark.m) == pol->mark.v.
> >
> > for tx/rx path only.
>
> If you are referring to the check in xfrm_policy_match() it's actually:
>
>   (fl->flowi_mark & pol->mark.m) != pol->mark.v
>
> Or more generically something like:
>
>   (mark & pol->mark.m) == pol->mark.v
>
> As we only have the mark on the packets/flow (no mask) to match against.
>
> > -static bool xfrm_policy_mark_match(struct xfrm_policy *policy,
> > +static bool xfrm_policy_mark_match(const struct xfrm_mark *mark,
> >                                  struct xfrm_policy *pol)
> >  {
> > -     if (policy->mark.v == pol->mark.v &&
> > -         policy->priority == pol->priority)
> > -             return true;
> > -
> > -     return false;
> > +     return mark->v == pol->mark.v && mark->m == pol->mark.m;
> >  }
>
> I guess you could make that function `static inline`.
>
Thanks, Tobias, I will post v2 with your suggestion.

Just note that I have another patch similar to this one,
but for xfrm_state's mark. I will post it later too.
Please also check if it may cause any regression.

^ permalink raw reply

* Re: [PATCH net] tcp: grow window for OOO packets only for SACK flows
From: Neal Cardwell @ 2020-06-16 13:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Yuchung Cheng,
	Venkat Venkatsubra
In-Reply-To: <20200616033707.145423-1-edumazet@google.com>

On Mon, Jun 15, 2020 at 11:37 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Back in 2013, we made a change that broke fast retransmit
> for non SACK flows.
>
> Indeed, for these flows, a sender needs to receive three duplicate
> ACK before starting fast retransmit. Sending ACK with different
> receive window do not count.
>
> Even if enabling SACK is strongly recommended these days,
> there still are some cases where it has to be disabled.
>
> Not increasing the window seems better than having to
> rely on RTO.
...
> Fixes: 4e4f1fc22681 ("tcp: properly increase rcv_ssthresh for ofo packets")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> ---

Thanks for the fix, Eric!

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply

* [PATCH] net: ath10k: fix memcpy size from untrusted input
From: Zekun Shen @ 2020-06-16 13:25 UTC (permalink / raw)
  Cc: Zekun Shen, Kalle Valo, David S. Miller, Jakub Kicinski, ath10k,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <87tuzbihbg.fsf@codeaurora.org>

A compromized ath10k peripheral is able to control the size argument
of memcpy in ath10k_pci_hif_exchange_bmi_msg.

The min result from previous line is not used as the size argument
for memcpy. Instead, xfer.resp_len comes from untrusted stream dma
input. The value comes from "nbytes" in ath10k_pci_bmi_recv_data,
which is set inside _ath10k_ce_completed_recv_next_nolock with the line

nbytes = __le16_to_cpu(sdesc.nbytes);

sdesc is a stream dma region which device can write to.

Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
---
KASAN report stacktrace:
[    6.293972] ==================================================================
[    6.295696] BUG: KASAN: slab-out-of-bounds in ath10k_pci_hif_exchange_bmi_msg+0xb2f/0x14d0 [ath10k_pci]
[    6.297031] Read of size 9769 at addr ffff888034c49c00 by task kworker/u2:2/82
[    6.298054]
[    6.298288] CPU: 0 PID: 82 Comm: kworker/u2:2 Tainted: G        W         5.6.0 #51
[    6.299410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/4
[    6.301107] Workqueue: ath10k_wq ath10k_core_register_work [ath10k_core]
[    6.302203] Call Trace:
[    6.302596]  dump_stack+0x75/0x9b
[    6.303114]  ? ath10k_pci_hif_exchange_bmi_msg+0xb2f/0x14d0 [ath10k_pci]
[    6.304096]  print_address_description.constprop.5+0x16/0x310
[    6.304933]  ? ath10k_pci_hif_exchange_bmi_msg+0xb2f/0x14d0 [ath10k_pci]
[    6.305898]  ? ath10k_pci_hif_exchange_bmi_msg+0xb2f/0x14d0 [ath10k_pci]
[    6.306873]  __kasan_report+0x158/0x1c0
[    6.307441]  ? ath10k_pci_hif_exchange_bmi_msg+0xb2f/0x14d0 [ath10k_pci]
[    6.308432]  kasan_report+0xe/0x20
[    6.308938]  check_memory_region+0x15d/0x1b0
[    6.309564]  memcpy+0x1f/0x50
[    6.310006]  ath10k_pci_hif_exchange_bmi_msg+0xb2f/0x14d0 [ath10k_pci]
[    6.310947]  ? ath10k_pci_rx_replenish_retry+0x170/0x170 [ath10k_pci]
[    6.311875]  ? check_unmap+0x64e/0x1bb0
[    6.312439]  ? _raw_write_lock+0xd0/0xd0
[    6.313045]  ? log_store.constprop.29+0x267/0x440
[    6.313732]  ? debug_dma_free_coherent+0x1c0/0x220
[    6.314440]  ? debug_dma_alloc_coherent+0x2f0/0x2f0
[    6.315156]  ath10k_bmi_get_target_info+0x1b8/0x350 [ath10k_core]
[    6.316058]  ? apic_timer_interrupt+0xa/0x20
[    6.316710]  ? ath10k_bmi_done+0x330/0x330 [ath10k_core]
[    6.317509]  ? ath10k_pci_diag_write_mem+0x31e/0x570 [ath10k_pci]
[    6.318402]  ? __kasan_check_read+0x10/0x10
[    6.319037]  ? _raw_spin_lock_irqsave+0x7b/0xd0
[    6.319755]  ? _raw_write_lock_irqsave+0xd0/0xd0
[    6.320463]  ? lock_timer_base+0xbc/0x150
[    6.321047]  ? enqueue_timer+0xda/0x270
[    6.321612]  ? mod_timer+0x406/0xad0
[    6.322147]  ? timer_reduce+0xb00/0xb00
[    6.322707]  ? _raw_write_lock_irqsave+0xd0/0xd0
[    6.323380]  ? ath10k_pci_sleep.part.14+0x163/0x1c0 [ath10k_pci]
[    6.324248]  ? ath10k_bus_pci_write32+0x158/0x1b0 [ath10k_pci]
[    6.325099]  ? ath10k_pci_hif_power_up+0x256/0x690 [ath10k_pci]
[    6.325970]  ? __switch_to_asm+0x40/0x70
[    6.326565]  ath10k_core_register_work+0x799/0x2070 [ath10k_core]
[    6.327453]  ? __switch_to_asm+0x34/0x70
[    6.328028]  ? __switch_to_asm+0x40/0x70
[    6.328603]  ? __switch_to+0x5d5/0xde0
[    6.329144]  ? __switch_to_asm+0x34/0x70
[    6.329754]  ? ath10k_core_stop+0xf0/0xf0 [ath10k_core]
[    6.330521]  ? __schedule+0x88c/0x1820
[    6.331068]  ? read_word_at_a_time+0xe/0x20
[    6.331675]  ? strscpy+0xa3/0x320
[    6.332162]  process_one_work+0x83c/0x14c0
[    6.332777]  worker_thread+0x82/0xee0
[    6.333335]  ? __kthread_parkme+0x8a/0x100
[    6.333955]  ? process_one_work+0x14c0/0x14c0
[    6.334592]  kthread+0x2f1/0x3a0
[    6.335070]  ? kthread_create_on_node+0xc0/0xc0
[    6.335785]  ret_from_fork+0x35/0x40
[    6.367721] ==================================================================

 drivers/net/wireless/ath/ath10k/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 1d941d53f..ad28d9156 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -2184,7 +2184,7 @@ int ath10k_pci_hif_exchange_bmi_msg(struct ath10k *ar,
 
 	if (ret == 0 && resp_len) {
 		*resp_len = min(*resp_len, xfer.resp_len);
-		memcpy(resp, tresp, xfer.resp_len);
+		memcpy(resp, tresp, *resp_len);
 	}
 err_dma:
 	kfree(treq);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] [bpf] xdp_redirect_cpu_user: Fix null pointer dereference
From: Daniel Borkmann @ 2020-06-16 13:10 UTC (permalink / raw)
  To: Gaurav Singh, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	open list:BPF (Safe dynamic programs and tools),
	open list:BPF (Safe dynamic programs and tools), open list
In-Reply-To: <20200614190434.31321-1-gaurav1086@gmail.com>

On 6/14/20 9:04 PM, Gaurav Singh wrote:
> Memset() on the pointer right after malloc() can cause
> a null pointer dereference if it failed to allocate memory.
> Fix this by replacing malloc/memset with a single calloc().
> 
> Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>

Squashed all three same fixes into one and pushed to bpf, thanks!

> @@ -222,11 +219,9 @@ static struct datarec *alloc_record_per_cpu(void)
>   static struct stats_record *alloc_stats_record(void)
>   {
>   	struct stats_record *rec;
> -	int i, size;
> +	int i;
>   
> -	size = sizeof(*rec) + n_cpus * sizeof(struct record);
> -	rec = malloc(size);
> -	memset(rec, 0, size);
> +	rec = calloc(n_cpus + 1, sizeof(struct record));

For the record, this one is buggy, so I fixed it up as well.

>   	if (!rec) {
>   		fprintf(stderr, "Mem alloc error\n");
>   		exit(EXIT_FAIL_MEM);
> 

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
From: Waiman Long @ 2020-06-16 13:05 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Andrew Morton, David Howells, Jarkko Sakkinen, James Morris,
	Serge E. Hallyn, Linus Torvalds, Joe Perches, Matthew Wilcox,
	David Rientjes, Michal Hocko, Johannes Weiner, Dan Carpenter,
	David Sterba, Jason A . Donenfeld, linux-mm, keyrings,
	linux-kernel, linux-crypto, linux-pm, linux-stm32, linux-amlogic,
	linux-mediatek, linuxppc-dev, virtualization, netdev, linux-ppp,
	wireguard, linux-wireless, devel, linux-scsi, target-devel,
	linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs, kasan-dev,
	linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity, stable
In-Reply-To: <20200616033035.GB902@sol.localdomain>

On 6/15/20 11:30 PM, Eric Biggers wrote:
> On Mon, Jun 15, 2020 at 09:57:16PM -0400, Waiman Long wrote:
>> The kzfree() function is normally used to clear some sensitive
>> information, like encryption keys, in the buffer before freeing it back
>> to the pool. Memset() is currently used for the buffer clearing. However,
>> it is entirely possible that the compiler may choose to optimize away the
>> memory clearing especially if LTO is being used. To make sure that this
>> optimization will not happen, memzero_explicit(), which is introduced
>> in v3.18, is now used in kzfree() to do the clearing.
>>
>> Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   mm/slab_common.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 9e72ba224175..37d48a56431d 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1726,7 +1726,7 @@ void kzfree(const void *p)
>>   	if (unlikely(ZERO_OR_NULL_PTR(mem)))
>>   		return;
>>   	ks = ksize(mem);
>> -	memset(mem, 0, ks);
>> +	memzero_explicit(mem, ks);
>>   	kfree(mem);
>>   }
>>   EXPORT_SYMBOL(kzfree);
> This is a good change, but the commit message isn't really accurate.  AFAIK, no
> one has found any case where this memset() gets optimized out.  And even with
> LTO, it would be virtually impossible due to all the synchronization and global
> data structures that kfree() uses.  (Remember that this isn't the C standard
> function "free()", so the compiler can't assign it any special meaning.)
> Not to mention that LTO support isn't actually upstream yet.
>
> I still agree with the change, but it might be helpful if the commit message
> were honest that this is really a hardening measure and about properly conveying
> the intent.  As-is this sounds like a critical fix, which might confuse people.

Yes, I agree that the commit log may look a bit scary. How about the 
following:

The kzfree() function is normally used to clear some sensitive
information, like encryption keys, in the buffer before freeing it back
to the pool. Memset() is currently used for buffer clearing. However
unlikely, there is still a non-zero probability that the compiler may
choose to optimize away the memory clearing especially if LTO is being
used in the future. To make sure that this optimization will never
happen, memzero_explicit(), which is introduced in v3.18, is now used
in kzfree() to future-proof it.

Cheers,
Longman


^ permalink raw reply

* Re: [PATCH] dpaa_eth: fix usage as DSA master, try 3
From: Vladimir Oltean @ 2020-06-16 13:00 UTC (permalink / raw)
  To: David Miller
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot,
	Madalin Bucur (OSS), netdev, joakim.tjernlund
In-Reply-To: <20200525.175808.465482238114904305.davem@davemloft.net>

On Tue, 26 May 2020 at 03:58, David Miller <davem@davemloft.net> wrote:
>
> From: Vladimir Oltean <olteanv@gmail.com>
> Date: Mon, 25 May 2020 00:22:51 +0300
>
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > The dpaa-eth driver probes on compatible string for the MAC node, and
> > the fman/mac.c driver allocates a dpaa-ethernet platform device that
> > triggers the probing of the dpaa-eth net device driver.
> >
> > All of this is fine, but the problem is that the struct device of the
> > dpaa_eth net_device is 2 parents away from the MAC which can be
> > referenced via of_node. So of_find_net_device_by_node can't find it, and
> > DSA switches won't be able to probe on top of FMan ports.
> >
> > It would be a bit silly to modify a core function
> > (of_find_net_device_by_node) to look for dev->parent->parent->of_node
> > just for one driver. We're just 1 step away from implementing full
> > recursion.
> >
> > Actually there have already been at least 2 previous attempts to make
> > this work:
> > - Commit a1a50c8e4c24 ("fsl/man: Inherit parent device and of_node")
> > - One or more of the patches in "[v3,0/6] adapt DPAA drivers for DSA":
> >   https://patchwork.ozlabs.org/project/netdev/cover/1508178970-28945-1-git-send-email-madalin.bucur@nxp.com/
> >   (I couldn't really figure out which one was supposed to solve the
> >   problem and how).
> >
> > Point being, it looks like this is still pretty much a problem today.
> > On T1040, the /sys/class/net/eth0 symlink currently points to
> >
> > ../../devices/platform/ffe000000.soc/ffe400000.fman/ffe4e6000.ethernet/dpaa-ethernet.0/net/eth0
> >
> > which pretty much illustrates the problem. The closest of_node we've got
> > is the "fsl,fman-memac" at /soc@ffe000000/fman@400000/ethernet@e6000,
> > which is what we'd like to be able to reference from DSA as host port.
> >
> > For of_find_net_device_by_node to find the eth0 port, we would need the
> > parent of the eth0 net_device to not be the "dpaa-ethernet" platform
> > device, but to point 1 level higher, aka the "fsl,fman-memac" node
> > directly. The new sysfs path would look like this:
> >
> > ../../devices/platform/ffe000000.soc/ffe400000.fman/ffe4e6000.ethernet/net/eth0
> >
> > And this is exactly what SET_NETDEV_DEV does. It sets the parent of the
> > net_device. The new parent has an of_node associated with it, and
> > of_dev_node_match already checks for the of_node of the device or of its
> > parent.
> >
> > Fixes: a1a50c8e4c24 ("fsl/man: Inherit parent device and of_node")
> > Fixes: c6e26ea8c893 ("dpaa_eth: change device used")
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> Applied and queued up for -stable, thanks.

Joakim notified me that this breaks stable trees.
It turns out that my assessment about who-broke-who was wrong.
The real Fixes: tag should have been:

Fixes: 060ad66f9795 ("dpaa_eth: change DMA device")

which changes the device on which SET_NETDEV_DEV is made.

git describe --tags 060ad66f97954
v5.4-rc3-783-g060ad66f9795

Which means that it shouldn't have been backported to 4.19 and below.

What is the procedure to revert it from those stable trees? Would I
need to revert this patch in "net" and apply another one with the
correct Fixes: tag?

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH net v2 2/2] net/sched: act_gate: fix configuration of the periodic timer
From: Davide Caratti @ 2020-06-16 12:42 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Po Liu, Cong Wang, David S . Miller, netdev
In-Reply-To: <CA+h21hrCScMMA9cm0fhF+eLRWda403pX=t3PKRoBhkE8rrR-rw@mail.gmail.com>

On Tue, 2020-06-16 at 13:38 +0300, Vladimir Oltean wrote:
> Hi Davide,
> 
> On Tue, 16 Jun 2020 at 13:13, Davide Caratti <dcaratti@redhat.com> wrote:
> > hello Vladimir,
> > 
> > thanks a lot for reviewing this.
> > 
> > On Tue, 2020-06-16 at 00:55 +0300, Vladimir Oltean wrote:

[...]

> > > What if you split the "replace" functionality of gate_setup_timer into
> > > a separate gate_cancel_timer function, which you could call earlier
> > > (before taking the spin lock)?
> > 
> > I think it would introduce the following 2 problems:
> >
> > problem #1) a race condition, see below:

[...]

> > > @@ -433,6 +448,11 @@ static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > > >         if (goto_ch)
> > > >                 tcf_chain_put_by_act(goto_ch);
> > > >  release_idr:
> > > > +       /* action is not in: hitimer can be inited without taking tcf_lock */
> > > > +       if (ret == ACT_P_CREATED)
> > > > +               gate_setup_timer(gact, gact->param.tcfg_basetime,
> > > > +                                gact->tk_offset, gact->param.tcfg_clockid,
> > > > +                                true);
> > 
> > please note, here I felt the need to add a comment, because when ret ==
> > ACT_P_CREATED the action is not inserted in any list, so there is no
> > concurrent writer of gact-> members for that action.
> > 
> 
> Then please rephrase the comment. I had read it and it still wasn't
> clear at all for me what you were talking about.

something like:

/* action is not yet inserted in any list: it's safe to init hitimer 
 * without taking tcf_lock.
 */

would be ok?

[...]

> I wonder, could you call tcf_gate_cleanup instead of just canceling the
> hrtimer?

not with the current tcf_gate_cleanup() [1] and parse_gate_list() [2],
because it would introduce another bug: 'p->entries' gets cleared on
action overwrite after being successfully created here:

395         if (tb[TCA_GATE_ENTRY_LIST]) {
396                 err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
397                 if (err < 0)
398                         goto chain_put;
399         }


like mentioned earlier, 'hitimer' can not be canceled/re-initialized easily when
tcf_gate_init() still has a possible error path. And in my understanding
'p->entries' must be consistent when the timer is initialized.

IMO, the correct way to handle 'entries' is to:

- populate the list on a local variable, before taking the spinlock and
allocating the IDR

- assign to p->entries after validation is successful (with the spinlock
taken). Same as what was done with 'cycletime' in patch 1/2, but with the
variable initialized (btw, thanks for catching this), and free the old
list in case of action replace

- release the newly allocated list in the error path of tcf_gate_init()

(but again, this would be a fix for 'entries' - not for 'hitimer', so I
plan to work on it as a separate patch, that fits better 'net-next' rather
than 'net').

-- 
davide

[1] https://elixir.bootlin.com/linux/v5.8-rc1/source/net/sched/act_gate.c#L450
[2] https://elixir.bootlin.com/linux/v5.8-rc1/source/net/sched/act_gate.c#L235


^ permalink raw reply

* [kbuild] Re: [PATCH v3 1/3] net: phy: mscc: move shared probe code into a helper
From: Dan Carpenter @ 2020-06-16 12:27 UTC (permalink / raw)
  To: kbuild, Heiko Stuebner, davem, kuba
  Cc: lkp, kbuild-all, robh+dt, andrew, f.fainelli, hkallweit1, linux,
	netdev, devicetree, linux-kernel
In-Reply-To: <20200615144501.1140870-1-heiko@sntech.de>

[-- Attachment #1: Type: text/plain, Size: 3962 bytes --]

Hi Heiko,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on sparc-next/master net/master linus/master v5.8-rc1 next-20200616]
[cannot apply to robh/for-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Heiko-Stuebner/net-phy-mscc-move-shared-probe-code-into-a-helper/20200615-224727
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git cb8e59cc87201af93dfbb6c3dccc8fcad72a09c2
config: i386-randconfig-m021-20200616 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
drivers/net/phy/mscc/mscc_main.c:2002 vsc8574_probe() error: potentially dereferencing uninitialized 'vsc8531'.

# https://github.com/0day-ci/linux/commit/5be350208c014ad5e0afd06868ccaaefd6216345
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 5be350208c014ad5e0afd06868ccaaefd6216345
vim +/vsc8531 +2002 drivers/net/phy/mscc/mscc_main.c

5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1984  static int vsc8574_probe(struct phy_device *phydev)
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1985  {
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1986  	struct vsc8531_private *vsc8531;
                                                                                                                ^^^^^^^

5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1987  	int rc;
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1988  	u32 default_mode[4] = {VSC8531_LINK_1000_ACTIVITY,
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1989  	   VSC8531_LINK_100_ACTIVITY, VSC8531_LINK_ACTIVITY,
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1990  	   VSC8531_DUPLEX_COLLISION};
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1991  
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1992  	rc = vsc85xx_probe_helper(phydev, default_mode,
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1993  				  ARRAY_SIZE(default_mode),
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1994  				  VSC8584_SUPP_LED_MODES,
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1995  				  vsc8584_hw_stats,
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1996  				  ARRAY_SIZE(vsc8584_hw_stats));
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1997  	if (rc < 0)
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  1998  		return rc;
00d70d8e0e7811 drivers/net/phy/mscc.c           Quentin Schulz 2018-10-08  1999  
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  2000  	vsc8584_get_base_addr(phydev);
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15  2001  	return devm_phy_package_join(&phydev->mdio.dev, phydev,
5be350208c014a drivers/net/phy/mscc/mscc_main.c Heiko Stuebner 2020-06-15 @2002  				     vsc8531->base_addr, 0);
                                                                                                                     ^^^^^^^^^^^^^^^^^^
Not initialized.

00d70d8e0e7811 drivers/net/phy/mscc.c           Quentin Schulz 2018-10-08  2003  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32738 bytes --]

[-- Attachment #3: Type: text/plain, Size: 149 bytes --]

_______________________________________________
kbuild mailing list -- kbuild@lists.01.org
To unsubscribe send an email to kbuild-leave@lists.01.org

^ permalink raw reply

* Re: [PATCH 4/4] thunderbolt: Get rid of E2E workaround
From: Mika Westerberg @ 2020-06-16 11:55 UTC (permalink / raw)
  To: Yehezkel Bernat
  Cc: Greg Kroah-Hartman, linux-usb, Michael Jamet, David S . Miller,
	Jakub Kicinski, Andreas Noever, Lukas Wunner, netdev
In-Reply-To: <CA+CmpXtOAUnSdhjwi5HXaJhPzbUUsZZsitFifyhyPk+X2c=wYw@mail.gmail.com>

On Mon, Jun 15, 2020 at 10:54:52PM +0300, Yehezkel Bernat wrote:
> On Mon, Jun 15, 2020 at 6:55 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > On Mon, Jun 15, 2020 at 06:41:32PM +0300, Yehezkel Bernat wrote:
> > > > I think you are talking about the "prtstns" property in the network
> > > > driver. There we only set TBNET_MATCH_FRAGS_ID (bit 1). This is the
> > > > thing that get exposed to the other side of the connection and we never
> > > > announced support for full E2E.
> > >
> > >
> > > Ah, yes, this one, Thanks!
> > > As Windows driver uses it for flagging full-E2E, and we completely drop E2E
> > > support here, it may worth to mention there that this is what bit 2 is used in
> > > Windows so any reuse should consider the possible compatibility issue.
> >
> > Note we only drop dead code in this patch. It is that workaround for
> > Falcon Ridge controller we actually never used.
> >
> > I can add a comment to the network driver about the full E2E support
> > flag as a separate patch if you think it is useful.
> >
> > The network protocol will be public soon I guess because USB4 spec
> > refers to "USB4 Inter-Domain Specification, Revision 1.0, [to be
> > published] – (USB4 Inter-Domain Specification)" so I would expect it to
> > be explained there as well.
> 
> I see. I leave it for your decision, then.
> Thanks for bearing with me.

OK, I think it makes sense to add the comment so I'll do that as
a separate patch (will probably go next week since I have some other
patches to deal with this week, and Friday is holiday in Finland).

^ permalink raw reply

* Re: [PATCH] e1000e: continue to init phy even when failed to disable ULP
From: Aaron Ma @ 2020-06-16 11:26 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jeff Kirsher, David S. Miller, Jakub Kicinski,
	moderated list:INTEL ETHERNET DRIVERS, netdev, linux-kernel,
	vitaly.lifshits, sasha.neftin
In-Reply-To: <4CC928F1-02CC-4675-908E-42B26C151FA1@canonical.com>



On 6/16/20 7:23 PM, Kai-Heng Feng wrote:
> 
> 
>> On Jun 16, 2020, at 18:05, Aaron Ma <aaron.ma@canonical.com> wrote:
>>
>> After commit "e1000e: disable s0ix entry and exit flows for ME systems",
>> some ThinkPads always failed to disable ulp by ME.
>> commit "e1000e: Warn if disabling ULP failed" break out of init phy:
>>
>> error log:
>> [   42.364753] e1000e 0000:00:1f.6 enp0s31f6: Failed to disable ULP
>> [   42.524626] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Unicast Packet
>> [   42.822476] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error
>>
>> When disable s0ix, E1000_FWSM_ULP_CFG_DONE will never be 1.
>> If continue to init phy like before, it can work as before.
>> iperf test result good too.
>>
>> Chnage e_warn to e_dbg, in case it confuses.
>>
>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>> ---
>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> index f999cca37a8a..63405819eb83 100644
>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> @@ -302,8 +302,7 @@ static s32 e1000_init_phy_workarounds_pchlan(struct e1000_hw *hw)
>> 	hw->dev_spec.ich8lan.ulp_state = e1000_ulp_state_unknown;
>> 	ret_val = e1000_disable_ulp_lpt_lp(hw, true);
> 
> If si0x entry isn't enabled, maybe skip calling e1000_disable_ulp_lpt_lp() altogether?
> We can use e1000e_check_me() to check that.
> 

No, s0ix is different feature with ULP mode.

Aaron

>> 	if (ret_val) {
>> -		e_warn("Failed to disable ULP\n");
>> -		goto out;
>> +		e_dbg("Failed to disable ULP\n");
>> 	}
> 
> The change of "e1000e: Warn if disabling ULP failed" is intentional to catch bugs like this.
> 
> Kai-Heng
> 
>>
>> 	ret_val = hw->phy.ops.acquire(hw);
>> -- 
>> 2.26.2
>>
> 

^ permalink raw reply

* Re: [PATCH] e1000e: continue to init phy even when failed to disable ULP
From: Kai-Heng Feng @ 2020-06-16 11:23 UTC (permalink / raw)
  To: Aaron Ma
  Cc: Jeff Kirsher, David S. Miller, Jakub Kicinski,
	moderated list:INTEL ETHERNET DRIVERS, netdev, linux-kernel,
	vitaly.lifshits, sasha.neftin
In-Reply-To: <20200616100512.22512-1-aaron.ma@canonical.com>



> On Jun 16, 2020, at 18:05, Aaron Ma <aaron.ma@canonical.com> wrote:
> 
> After commit "e1000e: disable s0ix entry and exit flows for ME systems",
> some ThinkPads always failed to disable ulp by ME.
> commit "e1000e: Warn if disabling ULP failed" break out of init phy:
> 
> error log:
> [   42.364753] e1000e 0000:00:1f.6 enp0s31f6: Failed to disable ULP
> [   42.524626] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Unicast Packet
> [   42.822476] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error
> 
> When disable s0ix, E1000_FWSM_ULP_CFG_DONE will never be 1.
> If continue to init phy like before, it can work as before.
> iperf test result good too.
> 
> Chnage e_warn to e_dbg, in case it confuses.
> 
> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
> ---
> drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index f999cca37a8a..63405819eb83 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -302,8 +302,7 @@ static s32 e1000_init_phy_workarounds_pchlan(struct e1000_hw *hw)
> 	hw->dev_spec.ich8lan.ulp_state = e1000_ulp_state_unknown;
> 	ret_val = e1000_disable_ulp_lpt_lp(hw, true);

If si0x entry isn't enabled, maybe skip calling e1000_disable_ulp_lpt_lp() altogether?
We can use e1000e_check_me() to check that.

> 	if (ret_val) {
> -		e_warn("Failed to disable ULP\n");
> -		goto out;
> +		e_dbg("Failed to disable ULP\n");
> 	}

The change of "e1000e: Warn if disabling ULP failed" is intentional to catch bugs like this.

Kai-Heng

> 
> 	ret_val = hw->phy.ops.acquire(hw);
> -- 
> 2.26.2
> 


^ permalink raw reply

* Re: [PATCH net v3 2/4] flow_offload: fix incorrect cb_priv check for flow_block_cb
From: Simon Horman @ 2020-06-16 10:51 UTC (permalink / raw)
  To: wenxu; +Cc: netdev, davem, pablo, vladbu
In-Reply-To: <1592277580-5524-3-git-send-email-wenxu@ucloud.cn>

On Tue, Jun 16, 2020 at 11:19:38AM +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> In the function __flow_block_indr_cleanup, The match stataments
> this->cb_priv == cb_priv is always false, the flow_block_cb->cb_priv
> is totally different data with the flow_indr_dev->cb_priv.
> 
> Store the representor cb_priv to the flow_block_cb->indr.cb_priv in
> the driver.
> 
> Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
> Signed-off-by: wenxu <wenxu@ucloud.cn>

Hi Wenxu,

I wonder if this can be resolved by using the cb_ident field of struct
flow_block_cb.

I observe that mlx5e_rep_indr_setup_block() seems to be the only call-site
where the value of the cb_ident parameter of flow_block_cb_alloc() is
per-block rather than per-device. So part of my proposal is to change
that.

The other part of my proposal is to make use of cb_ident in
__flow_block_indr_cleanup(). Which does seem to match the intended
purpose of cb_ident. Perhaps it would also be good to document what
the intended purpose of cb_ident (and the other fields of struct
flow_block_cb) is.

Compile tested only.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
index a62bcf0cf512..4de6fcae5252 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
@@ -438,7 +438,7 @@ mlx5e_rep_indr_setup_block(struct net_device *netdev,
 		list_add(&indr_priv->list,
 			 &rpriv->uplink_priv.tc_indr_block_priv_list);
 
-		block_cb = flow_block_cb_alloc(setup_cb, indr_priv, indr_priv,
+		block_cb = flow_block_cb_alloc(setup_cb, rpriv, indr_priv,
 					       mlx5e_rep_indr_block_unbind);
 		if (IS_ERR(block_cb)) {
 			list_del(&indr_priv->list);
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index b288d2f03789..d281fb182894 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -373,14 +373,13 @@ int flow_indr_dev_register(flow_indr_block_bind_cb_t *cb, void *cb_priv)
 EXPORT_SYMBOL(flow_indr_dev_register);
 
 static void __flow_block_indr_cleanup(void (*release)(void *cb_priv),
-				      void *cb_priv,
+				      void *cb_ident,
 				      struct list_head *cleanup_list)
 {
 	struct flow_block_cb *this, *next;
 
 	list_for_each_entry_safe(this, next, &flow_block_indr_list, indr.list) {
-		if (this->release == release &&
-		    this->cb_priv == cb_priv) {
+		if (this->release == release && this->cb_ident == cb_ident) {
 			list_move(&this->indr.list, cleanup_list);
 			return;
 		}

^ permalink raw reply related

* Re: [PATCH net] net: core: reduce recursion limit value
From: Taehee Yoo @ 2020-06-16 10:49 UTC (permalink / raw)
  To: Florian Westphal; +Cc: David Miller, Jakub Kicinski, Netdev
In-Reply-To: <20200615152148.GE16460@breakpoint.cc>

On Tue, 16 Jun 2020 at 00:21, Florian Westphal <fw@strlen.de> wrote:
>

Hi Florian,
Thank you for the review!

> Taehee Yoo <ap420073@gmail.com> wrote:
> > In the current code, ->ndo_start_xmit() can be executed recursively only
> > 10 times because of stack memory.
> > But, in the case of the vxlan, 10 recursion limit value results in
> > a stack overflow.
> [..]
>
> > Fixes: 97cdcf37b57e ("net: place xmit recursion in softnet data")
>
> That commit did not change the recursion limit,
> 11a766ce915fc9f86 ("net: Increase xmit RECURSION_LIMIT to 10.") did?

You're right.
I will send a v2 patch.

Thanks!

^ permalink raw reply

* [PATCH rdma-rc] RDMA/mlx5: Fix missed RST2INIT and INIT2INIT steps during ECE handshake
From: Leon Romanovsky @ 2020-06-16 10:45 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, Mark Zhang, netdev, Saeed Mahameed

From: Leon Romanovsky <leonro@mellanox.com>

Missed step during ECE handshake left userspace application with less
options for the ECE handshake with a need to do workarounds.

Fixes: 50aec2c3135e ("RDMA/mlx5: Return ECE data after modify QP")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qpc.c |  8 ++++++++
 include/linux/mlx5/mlx5_ifc.h    | 10 ++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index c19d91d6dce8..7c3968ef9cd1 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -346,6 +346,9 @@ static int get_ece_from_mbox(void *out, u16 opcode)
 	int ece = 0;

 	switch (opcode) {
+	case MLX5_CMD_OP_INIT2INIT_QP:
+		ece = MLX5_GET(init2init_qp_out, out, ece);
+		break;
 	case MLX5_CMD_OP_INIT2RTR_QP:
 		ece = MLX5_GET(init2rtr_qp_out, out, ece);
 		break;
@@ -355,6 +358,9 @@ static int get_ece_from_mbox(void *out, u16 opcode)
 	case MLX5_CMD_OP_RTS2RTS_QP:
 		ece = MLX5_GET(rts2rts_qp_out, out, ece);
 		break;
+	case MLX5_CMD_OP_RST2INIT_QP:
+		ece = MLX5_GET(rst2init_qp_out, out, ece);
+		break;
 	default:
 		break;
 	}
@@ -406,6 +412,7 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn,
 			return -ENOMEM;
 		MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
 				  opt_param_mask, qpc, uid);
+		MLX5_SET(rst2init_qp_in, mbox->in, ece, ece);
 		break;
 	case MLX5_CMD_OP_INIT2RTR_QP:
 		if (MBOX_ALLOC(mbox, init2rtr_qp))
@@ -439,6 +446,7 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn,
 			return -ENOMEM;
 		MOD_QP_IN_SET_QPC(init2init_qp, mbox->in, opcode, qpn,
 				  opt_param_mask, qpc, uid);
+		MLX5_SET(init2init_qp_in, mbox->in, ece, ece);
 		break;
 	default:
 		return -EINVAL;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 66aeaf113995..7a00110f2f01 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -4286,7 +4286,8 @@ struct mlx5_ifc_rst2init_qp_out_bits {

 	u8         syndrome[0x20];

-	u8         reserved_at_40[0x40];
+	u8         reserved_at_40[0x20];
+	u8         ece[0x20];
 };

 struct mlx5_ifc_rst2init_qp_in_bits {
@@ -4303,7 +4304,7 @@ struct mlx5_ifc_rst2init_qp_in_bits {

 	u8         opt_param_mask[0x20];

-	u8         reserved_at_a0[0x20];
+	u8         ece[0x20];

 	struct mlx5_ifc_qpc_bits qpc;

@@ -6622,7 +6623,8 @@ struct mlx5_ifc_init2init_qp_out_bits {

 	u8         syndrome[0x20];

-	u8         reserved_at_40[0x40];
+	u8         reserved_at_40[0x20];
+	u8         ece[0x20];
 };

 struct mlx5_ifc_init2init_qp_in_bits {
@@ -6639,7 +6641,7 @@ struct mlx5_ifc_init2init_qp_in_bits {

 	u8         opt_param_mask[0x20];

-	u8         reserved_at_a0[0x20];
+	u8         ece[0x20];

 	struct mlx5_ifc_qpc_bits qpc;

--
2.26.2


^ permalink raw reply related

* [PATCH mlx5-next v2 02/11] net/mlx5: Add support in query QP, CQ and MKEY segments
From: Leon Romanovsky @ 2020-06-16 10:39 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Maor Gottlieb, Jakub Kicinski, linux-rdma, netdev, Saeed Mahameed
In-Reply-To: <20200616104006.2425549-1-leon@kernel.org>

From: Maor Gottlieb <maorg@mellanox.com>

Introduce new resource dump segments - PRM_QUERY_QP,
PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource
dump in PRM query format.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c | 3 +++
 include/linux/mlx5/rsc_dump.h                           | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c
index 10218c2324cc..4924a5658853 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c
@@ -23,6 +23,9 @@ static const char *const mlx5_rsc_sgmt_name[] = {
 	MLX5_SGMT_STR_ASSING(SX_SLICE_ALL),
 	MLX5_SGMT_STR_ASSING(RDB),
 	MLX5_SGMT_STR_ASSING(RX_SLICE_ALL),
+	MLX5_SGMT_STR_ASSING(PRM_QUERY_QP),
+	MLX5_SGMT_STR_ASSING(PRM_QUERY_CQ),
+	MLX5_SGMT_STR_ASSING(PRM_QUERY_MKEY),
 };
 
 struct mlx5_rsc_dump {
diff --git a/include/linux/mlx5/rsc_dump.h b/include/linux/mlx5/rsc_dump.h
index 87415fa754fe..d11c0b228620 100644
--- a/include/linux/mlx5/rsc_dump.h
+++ b/include/linux/mlx5/rsc_dump.h
@@ -23,6 +23,9 @@ enum mlx5_sgmt_type {
 	MLX5_SGMT_TYPE_SX_SLICE_ALL,
 	MLX5_SGMT_TYPE_RDB,
 	MLX5_SGMT_TYPE_RX_SLICE_ALL,
+	MLX5_SGMT_TYPE_PRM_QUERY_QP,
+	MLX5_SGMT_TYPE_PRM_QUERY_CQ,
+	MLX5_SGMT_TYPE_PRM_QUERY_MKEY,
 	MLX5_SGMT_TYPE_MENU,
 	MLX5_SGMT_TYPE_TERMINATE,
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH mlx5-next v2 01/11] net/mlx5: Export resource dump interface
From: Leon Romanovsky @ 2020-06-16 10:39 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Maor Gottlieb, Jakub Kicinski, linux-rdma, netdev, Saeed Mahameed
In-Reply-To: <20200616104006.2425549-1-leon@kernel.org>

From: Maor Gottlieb <maorg@mellanox.com>

Export some of the resource dump API. mlx5_ib driver will use
it in downstream patches.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 .../mellanox/mlx5/core/diag/rsc_dump.c        |  3 ++
 .../mellanox/mlx5/core/diag/rsc_dump.h        | 33 +------------------
 .../diag => include/linux/mlx5}/rsc_dump.h    | 22 ++++---------
 3 files changed, 10 insertions(+), 48 deletions(-)
 copy {drivers/net/ethernet/mellanox/mlx5/core/diag => include/linux/mlx5}/rsc_dump.h (68%)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c
index 17ab7efe693d..10218c2324cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.c
@@ -130,11 +130,13 @@ struct mlx5_rsc_dump_cmd *mlx5_rsc_dump_cmd_create(struct mlx5_core_dev *dev,
 	cmd->mem_size = key->size;
 	return cmd;
 }
+EXPORT_SYMBOL(mlx5_rsc_dump_cmd_create);
 
 void mlx5_rsc_dump_cmd_destroy(struct mlx5_rsc_dump_cmd *cmd)
 {
 	kfree(cmd);
 }
+EXPORT_SYMBOL(mlx5_rsc_dump_cmd_destroy);
 
 int mlx5_rsc_dump_next(struct mlx5_core_dev *dev, struct mlx5_rsc_dump_cmd *cmd,
 		       struct page *page, int *size)
@@ -155,6 +157,7 @@ int mlx5_rsc_dump_next(struct mlx5_core_dev *dev, struct mlx5_rsc_dump_cmd *cmd,
 
 	return more_dump;
 }
+EXPORT_SYMBOL(mlx5_rsc_dump_next);
 
 #define MLX5_RSC_DUMP_MENU_SEGMENT 0xffff
 static int mlx5_rsc_dump_menu(struct mlx5_core_dev *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h
index 148270073e71..64c4956db6d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h
@@ -4,41 +4,10 @@
 #ifndef __MLX5_RSC_DUMP_H
 #define __MLX5_RSC_DUMP_H
 
+#include <linux/mlx5/rsc_dump.h>
 #include <linux/mlx5/driver.h>
 #include "mlx5_core.h"
 
-enum mlx5_sgmt_type {
-	MLX5_SGMT_TYPE_HW_CQPC,
-	MLX5_SGMT_TYPE_HW_SQPC,
-	MLX5_SGMT_TYPE_HW_RQPC,
-	MLX5_SGMT_TYPE_FULL_SRQC,
-	MLX5_SGMT_TYPE_FULL_CQC,
-	MLX5_SGMT_TYPE_FULL_EQC,
-	MLX5_SGMT_TYPE_FULL_QPC,
-	MLX5_SGMT_TYPE_SND_BUFF,
-	MLX5_SGMT_TYPE_RCV_BUFF,
-	MLX5_SGMT_TYPE_SRQ_BUFF,
-	MLX5_SGMT_TYPE_CQ_BUFF,
-	MLX5_SGMT_TYPE_EQ_BUFF,
-	MLX5_SGMT_TYPE_SX_SLICE,
-	MLX5_SGMT_TYPE_SX_SLICE_ALL,
-	MLX5_SGMT_TYPE_RDB,
-	MLX5_SGMT_TYPE_RX_SLICE_ALL,
-	MLX5_SGMT_TYPE_MENU,
-	MLX5_SGMT_TYPE_TERMINATE,
-
-	MLX5_SGMT_TYPE_NUM, /* Keep last */
-};
-
-struct mlx5_rsc_key {
-	enum mlx5_sgmt_type rsc;
-	int index1;
-	int index2;
-	int num_of_obj1;
-	int num_of_obj2;
-	int size;
-};
-
 #define MLX5_RSC_DUMP_ALL 0xFFFF
 struct mlx5_rsc_dump_cmd;
 struct mlx5_rsc_dump;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h b/include/linux/mlx5/rsc_dump.h
similarity index 68%
copy from drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h
copy to include/linux/mlx5/rsc_dump.h
index 148270073e71..87415fa754fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h
+++ b/include/linux/mlx5/rsc_dump.h
@@ -1,11 +1,10 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2019 Mellanox Technologies. */
-
-#ifndef __MLX5_RSC_DUMP_H
-#define __MLX5_RSC_DUMP_H
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies inc. */
 
 #include <linux/mlx5/driver.h>
-#include "mlx5_core.h"
+
+#ifndef __MLX5_RSC_DUMP
+#define __MLX5_RSC_DUMP
 
 enum mlx5_sgmt_type {
 	MLX5_SGMT_TYPE_HW_CQPC,
@@ -39,20 +38,11 @@ struct mlx5_rsc_key {
 	int size;
 };
 
-#define MLX5_RSC_DUMP_ALL 0xFFFF
 struct mlx5_rsc_dump_cmd;
-struct mlx5_rsc_dump;
-
-struct mlx5_rsc_dump *mlx5_rsc_dump_create(struct mlx5_core_dev *dev);
-void mlx5_rsc_dump_destroy(struct mlx5_core_dev *dev);
-
-int mlx5_rsc_dump_init(struct mlx5_core_dev *dev);
-void mlx5_rsc_dump_cleanup(struct mlx5_core_dev *dev);
 
 struct mlx5_rsc_dump_cmd *mlx5_rsc_dump_cmd_create(struct mlx5_core_dev *dev,
 						   struct mlx5_rsc_key *key);
 void mlx5_rsc_dump_cmd_destroy(struct mlx5_rsc_dump_cmd *cmd);
-
 int mlx5_rsc_dump_next(struct mlx5_core_dev *dev, struct mlx5_rsc_dump_cmd *cmd,
 		       struct page *page, int *size);
-#endif
+#endif /* __MLX5_RSC_DUMP */
-- 
2.26.2


^ permalink raw reply related

* [PATCH rdma-next v2 00/11] RAW format dumps through RDMAtool
From: Leon Romanovsky @ 2020-06-16 10:39 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Jakub Kicinski, Lijun Ou, linux-rdma,
	Maor Gottlieb, netdev, Potnuri Bharat Teja, Saeed Mahameed,
	Weihang Li, Wei Hu(Xavier)

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
v2:
 * Converted to specific nldev ops for RAW.
 * Rebased on top of v5.8-rc1.
v1: https://lore.kernel.org/linux-rdma/20200527135408.480878-1-leon@kernel.org
 * Maor dropped controversial change to dummy interface.
v0:
https://lore.kernel.org/linux-rdma/20200513095034.208385-1-leon@kernel.org

------------------------------------------------------------------------------

Hi,

The following series adds support to get the RDMA resource data in RAW
format. The main motivation for doing this is to enable vendors to return
the entire QP/CQ/MR data without a need from the vendor to set each
field separately.

Thanks

Maor Gottlieb (11):
  net/mlx5: Export resource dump interface
  net/mlx5: Add support in query QP, CQ and MKEY segments
  RDMA/core: Don't call fill_res_entry for PD
  RDMA: Add dedicated MR resource tracker function
  RDMA: Add a dedicated CQ resource tracker function
  RDMA: Add dedicated QP resource tracker function
  RDMA: Add dedicated CM_ID resource tracker function
  RDMA: Add support to dump resource tracker in RAW format
  RDMA/mlx5: Add support to get QP resource in RAW format
  RDMA/mlx5: Add support to get CQ resource in RAW format
  RDMA/mlx5: Add support to get MR resource in RAW format

 drivers/infiniband/core/device.c              |  10 +-
 drivers/infiniband/core/nldev.c               | 176 +++++++++++-------
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h        |   7 +-
 drivers/infiniband/hw/cxgb4/provider.c        |  11 +-
 drivers/infiniband/hw/cxgb4/restrack.c        |  24 +--
 drivers/infiniband/hw/hns/hns_roce_device.h   |   4 +-
 drivers/infiniband/hw/hns/hns_roce_main.c     |   2 +-
 drivers/infiniband/hw/hns/hns_roce_restrack.c |  14 +-
 drivers/infiniband/hw/mlx5/main.c             |   7 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   9 +-
 drivers/infiniband/hw/mlx5/restrack.c         | 105 +++++++++--
 .../mellanox/mlx5/core/diag/rsc_dump.c        |   6 +
 .../mellanox/mlx5/core/diag/rsc_dump.h        |  33 +---
 .../diag => include/linux/mlx5}/rsc_dump.h    |  25 +--
 include/rdma/ib_verbs.h                       |  13 +-
 include/uapi/rdma/rdma_netlink.h              |   8 +
 16 files changed, 264 insertions(+), 190 deletions(-)
 copy {drivers/net/ethernet/mellanox/mlx5/core/diag => include/linux/mlx5}/rsc_dump.h (68%)

--
2.26.2


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox