Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v5 net] stmmac: 802.1ad tag stripping fix
From: David Miller @ 2018-06-03 14:33 UTC (permalink / raw)
  To: eladv6
  Cc: makita.toshiaki, Jose.Abreu, f.fainelli, netdev, peppe.cavallaro,
	alexandre.torgue
In-Reply-To: <113191f7-ad35-151f-3414-a2342ff0e13c@gmail.com>

From: Elad Nachman <eladv6@gmail.com>
Date: Wed, 30 May 2018 08:48:25 +0300

>  static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
>  {
> -	struct ethhdr *ehdr;
> +	struct vlan_ethhdr *veth;
>  	u16 vlanid;
> +	__be16 vlan_proto;

Please order local variables from longest to shortest line.

>  
> -	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
> -	    NETIF_F_HW_VLAN_CTAG_RX &&
> -	    !__vlan_get_tag(skb, &vlanid)) {
> +	if (!__vlan_get_tag(skb, &vlanid)) {
>  		/* pop the vlan tag */
> -		ehdr = (struct ethhdr *)skb->data;
> -		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
> +		veth = (struct vlan_ethhdr *)skb->data;
> +		vlan_proto = veth->h_vlan_proto;
> +		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
>  		skb_pull(skb, VLAN_HLEN);
> -		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
> +		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
>  	}
>  }

I can't see how it is valid to do an unconditional software VLAN
untagging even when VLAN is disabled in the kernel config or the
NETIF_F_* feature bits are not set.

At a minimum that feature test has to stay there, and when it's clear
we let the generic VLAN code untag the packet.

^ permalink raw reply

* Re: [PATCH net] net: ipv6: prevent use after free in ip6_route_mpath_notify()
From: Eric Dumazet @ 2018-06-03 14:31 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, David S . Miller; +Cc: netdev
In-Reply-To: <4b46d531-904b-6e5f-67ce-a275f0826d47@cumulusnetworks.com>



On 06/03/2018 07:01 AM, David Ahern wrote:
> On 6/3/18 7:35 AM, Eric Dumazet wrote:
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index f4d61736c41abe8cd7f439c4a37100e90c1eacca..830eefdbdb6734eb81ea0322fb6077ee20be1889 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -4263,7 +4263,9 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
>>  
>>  	err_nh = NULL;
>>  	list_for_each_entry(nh, &rt6_nh_list, next) {
>> +		dst_release(&rt_last->dst);
>>  		rt_last = nh->rt6_info;
>> +		dst_hold(&rt_last->dst);
>>  		err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc, extack);
>>  		/* save reference to first route for notification */
>>  		if (!rt_notif && !err)
>> @@ -4317,7 +4319,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
>>  		list_del(&nh->next);
>>  		kfree(nh);
>>  	}
>> -
>> +	dst_release(&rt_last->dst);
>>  	return err;
>>  }
> 
> Since the rtnl lock is held, a successfully inserted route can not be
> removed until ip6_route_multipath_add finishes. This is a simpler change
> that works with net-next as well:

Your patch changes the intent of your original commit.

It seems you wanted rt_last to point to the last attempted insertion,
not the last successful one ?

Or have I misunderstood, and not only we had a use-after-free, but also
a semantic error ?


> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index f4d61736c41a..1684197c189f 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4263,11 +4263,12 @@ static int ip6_route_multipath_add(struct
> fib6_config *cfg,
> 
>         err_nh = NULL;
>         list_for_each_entry(nh, &rt6_nh_list, next) {
> -               rt_last = nh->rt6_info;
>                 err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc, extack);
>                 /* save reference to first route for notification */
>                 if (!rt_notif && !err)
>                         rt_notif = nh->rt6_info;
> +               if (!err)
> +                       rt_last = nh->rt6_info;
> 
>                 /* nh->rt6_info is used or freed at this point, reset to
> NULL*/
>                 nh->rt6_info = NULL;
> 
> 
> Is there a reproducer for the syzbot case?

Not yet.

^ permalink raw reply

* Re: [PATCH 0/9,v2] Netfilter updates for net-next
From: David Miller @ 2018-06-03 14:31 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20180602232539.10574-1-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun,  3 Jun 2018 01:25:39 +0200

> The following patchset contains Netfilter updates for your net-next tree:
> 
> 1) Get rid of nf_sk_is_transparent(), use inet_sk_transparent() instead.
>    From Máté Eckl.
> 
> 2) Move shared tproxy infrastructure to nf_tproxy_ipv4 and nf_tproxy_ipv6.
>    Also from Máté.
> 
> 3) Add hashtable to speed up chain lookups by name, from Florian Westphal.
> 
> 4) Patch series to add connlimit support reusing part of the
>    nf_conncount infrastructure. This includes preparation changes such
>    passing context to the object and expression destroy interface;
>    garbage collection for expressions embedded into set elements, and
>    the introduction of the clone_destroy interface for expressions.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH net] net: ipv6: prevent use after free in ip6_route_mpath_notify()
From: David Ahern @ 2018-06-03 14:01 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <20180603133546.28635-1-edumazet@google.com>

On 6/3/18 7:35 AM, Eric Dumazet wrote:
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index f4d61736c41abe8cd7f439c4a37100e90c1eacca..830eefdbdb6734eb81ea0322fb6077ee20be1889 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4263,7 +4263,9 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
>  
>  	err_nh = NULL;
>  	list_for_each_entry(nh, &rt6_nh_list, next) {
> +		dst_release(&rt_last->dst);
>  		rt_last = nh->rt6_info;
> +		dst_hold(&rt_last->dst);
>  		err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc, extack);
>  		/* save reference to first route for notification */
>  		if (!rt_notif && !err)
> @@ -4317,7 +4319,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
>  		list_del(&nh->next);
>  		kfree(nh);
>  	}
> -
> +	dst_release(&rt_last->dst);
>  	return err;
>  }

Since the rtnl lock is held, a successfully inserted route can not be
removed until ip6_route_multipath_add finishes. This is a simpler change
that works with net-next as well:

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f4d61736c41a..1684197c189f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4263,11 +4263,12 @@ static int ip6_route_multipath_add(struct
fib6_config *cfg,

        err_nh = NULL;
        list_for_each_entry(nh, &rt6_nh_list, next) {
-               rt_last = nh->rt6_info;
                err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc, extack);
                /* save reference to first route for notification */
                if (!rt_notif && !err)
                        rt_notif = nh->rt6_info;
+               if (!err)
+                       rt_last = nh->rt6_info;

                /* nh->rt6_info is used or freed at this point, reset to
NULL*/
                nh->rt6_info = NULL;


Is there a reproducer for the syzbot case?

^ permalink raw reply related

* [PATCH net] net: ipv6: prevent use after free in ip6_route_mpath_notify()
From: Eric Dumazet @ 2018-06-03 13:35 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, David Ahern

syzbot reported a use-after-free [1]

Issue here is that rt_last might have been freed already.
We need to grab a refcount on it to prevent this.

[1]
BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555

CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
 ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 rtnetlink_rcv_msg+0x466/0xc10 net/core/rtnetlink.c:4646
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4664
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 ___sys_sendmsg+0x805/0x940 net/socket.c:2117
 __sys_sendmsg+0x115/0x270 net/socket.c:2155
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441819
RSP: 002b:00007ffe841e19d8 EFLAGS: 00000217 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000004
RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000402510
R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
 dst_alloc+0xbb/0x1d0 net/core/dst.c:104
 __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
 ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
 ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
 ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 rtnetlink_rcv_msg+0x466/0xc10 net/core/rtnetlink.c:4646
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4664
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 ___sys_sendmsg+0x805/0x940 net/socket.c:2117
 __sys_sendmsg+0x115/0x270 net/socket.c:2155
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
 dst_destroy+0x267/0x3c0 net/core/dst.c:140
 dst_release_immediate+0x71/0x9e net/core/dst.c:205
 fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
 ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 rtnetlink_rcv_msg+0x466/0xc10 net/core/rtnetlink.c:4646
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4664
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 ___sys_sendmsg+0x805/0x940 net/socket.c:2117
 __sys_sendmsg+0x115/0x270 net/socket.c:2155
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801bf789c40
 which belongs to the cache ip6_dst_cache of size 320
The buggy address is located 176 bytes inside of
 320-byte region [ffff8801bf789c40, ffff8801bf789d80)
The buggy address belongs to the page:
page:ffffea0006fde240 count:1 mapcount:0 mapping:ffff8801bf789040 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801bf789040 0000000000000000 000000010000000a
raw: ffffea0006f92f20 ffff8801cd9e7248 ffff8801cda00c40 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801bf789b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8801bf789c00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>ffff8801bf789c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff8801bf789d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801bf789d80: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00

Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv6/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f4d61736c41abe8cd7f439c4a37100e90c1eacca..830eefdbdb6734eb81ea0322fb6077ee20be1889 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4263,7 +4263,9 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
 
 	err_nh = NULL;
 	list_for_each_entry(nh, &rt6_nh_list, next) {
+		dst_release(&rt_last->dst);
 		rt_last = nh->rt6_info;
+		dst_hold(&rt_last->dst);
 		err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc, extack);
 		/* save reference to first route for notification */
 		if (!rt_notif && !err)
@@ -4317,7 +4319,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
 		list_del(&nh->next);
 		kfree(nh);
 	}
-
+	dst_release(&rt_last->dst);
 	return err;
 }
 
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related

* [PATCH 2/2] net: ethernet: bnx2: Replace NULL comparison
From: Varsha Rao @ 2018-06-03 11:49 UTC (permalink / raw)
  To: Rasesh Mody, Harish Patil, Dept-GELinuxNICDev, David S. Miller,
	netdev, linux-kernel, Nicholas Mc Guire, Lukas Bulwahn
  Cc: Varsha Rao
In-Reply-To: <cover.1528025568.git.rvarsha016@gmail.com>

This patch fixes the checkpatch issue of NULL comparison. Replace x == NULL
with !x, by using the following coccinelle script:

@disable is_null@
expression e;
@@
-e==NULL
+!e

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
---
 drivers/net/ethernet/broadcom/bnx2.c | 42 ++++++++++++++--------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 2306523778d4..3853296d78c1 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -384,7 +384,7 @@ static int bnx2_register_cnic(struct net_device *dev, struct cnic_ops *ops,
 	struct bnx2 *bp = netdev_priv(dev);
 	struct cnic_eth_dev *cp = &bp->cnic_eth_dev;
 
-	if (ops == NULL)
+	if (!ops)
 		return -EINVAL;
 
 	if (cp->drv_state & CNIC_DRV_STATE_REGD)
@@ -755,13 +755,13 @@ bnx2_alloc_tx_mem(struct bnx2 *bp)
 		struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
 
 		txr->tx_buf_ring = kzalloc(SW_TXBD_RING_SIZE, GFP_KERNEL);
-		if (txr->tx_buf_ring == NULL)
+		if (!txr->tx_buf_ring)
 			return -ENOMEM;
 
 		txr->tx_desc_ring =
 			dma_alloc_coherent(&bp->pdev->dev, TXBD_RING_SIZE,
 					   &txr->tx_desc_mapping, GFP_KERNEL);
-		if (txr->tx_desc_ring == NULL)
+		if (!txr->tx_desc_ring)
 			return -ENOMEM;
 	}
 	return 0;
@@ -779,7 +779,7 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
 
 		rxr->rx_buf_ring =
 			vzalloc(SW_RXBD_RING_SIZE * bp->rx_max_ring);
-		if (rxr->rx_buf_ring == NULL)
+		if (!rxr->rx_buf_ring)
 			return -ENOMEM;
 
 		for (j = 0; j < bp->rx_max_ring; j++) {
@@ -788,7 +788,7 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
 						   RXBD_RING_SIZE,
 						   &rxr->rx_desc_mapping[j],
 						   GFP_KERNEL);
-			if (rxr->rx_desc_ring[j] == NULL)
+			if (!rxr->rx_desc_ring[j])
 				return -ENOMEM;
 
 		}
@@ -796,7 +796,7 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
 		if (bp->rx_pg_ring_size) {
 			rxr->rx_pg_ring = vzalloc(SW_RXPG_RING_SIZE *
 						  bp->rx_max_pg_ring);
-			if (rxr->rx_pg_ring == NULL)
+			if (!rxr->rx_pg_ring)
 				return -ENOMEM;
 
 		}
@@ -807,7 +807,7 @@ bnx2_alloc_rx_mem(struct bnx2 *bp)
 						   RXBD_RING_SIZE,
 						   &rxr->rx_pg_desc_mapping[j],
 						   GFP_KERNEL);
-			if (rxr->rx_pg_desc_ring[j] == NULL)
+			if (!rxr->rx_pg_desc_ring[j])
 				return -ENOMEM;
 
 		}
@@ -845,7 +845,7 @@ bnx2_alloc_stats_blk(struct net_device *dev)
 				sizeof(struct statistics_block);
 	status_blk = dma_zalloc_coherent(&bp->pdev->dev, bp->status_stats_size,
 					 &bp->status_blk_mapping, GFP_KERNEL);
-	if (status_blk == NULL)
+	if (!status_blk)
 		return -ENOMEM;
 
 	bp->status_blk = status_blk;
@@ -914,7 +914,7 @@ bnx2_alloc_mem(struct bnx2 *bp)
 						BNX2_PAGE_SIZE,
 						&bp->ctx_blk_mapping[i],
 						GFP_KERNEL);
-			if (bp->ctx_blk[i] == NULL)
+			if (!bp->ctx_blk[i])
 				goto alloc_mem_err;
 		}
 	}
@@ -2667,7 +2667,7 @@ bnx2_alloc_bad_rbuf(struct bnx2 *bp)
 	u32 val;
 
 	good_mbuf = kmalloc(512 * sizeof(u16), GFP_KERNEL);
-	if (good_mbuf == NULL)
+	if (!good_mbuf)
 		return -ENOMEM;
 
 	BNX2_WR(bp, BNX2_MISC_ENABLE_SET_BITS,
@@ -3225,7 +3225,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 
 		if (len <= bp->rx_copy_thresh) {
 			skb = netdev_alloc_skb(bp->dev, len + 6);
-			if (skb == NULL) {
+			if (!skb) {
 				bnx2_reuse_rx_data(bp, rxr, data, sw_ring_cons,
 						  sw_ring_prod);
 				goto next_rx;
@@ -4561,7 +4561,7 @@ bnx2_nvram_write(struct bnx2 *bp, u32 offset, u8 *data_buf,
 
 	if (align_start || align_end) {
 		align_buf = kmalloc(len32, GFP_KERNEL);
-		if (align_buf == NULL)
+		if (!align_buf)
 			return -ENOMEM;
 		if (align_start) {
 			memcpy(align_buf, start, 4);
@@ -4575,7 +4575,7 @@ bnx2_nvram_write(struct bnx2 *bp, u32 offset, u8 *data_buf,
 
 	if (!(bp->flash_info->flags & BNX2_NV_BUFFERED)) {
 		flash_buffer = kmalloc(264, GFP_KERNEL);
-		if (flash_buffer == NULL) {
+		if (!flash_buffer) {
 			rc = -ENOMEM;
 			goto nvram_write_end;
 		}
@@ -5440,7 +5440,7 @@ bnx2_free_tx_skbs(struct bnx2 *bp)
 		struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
 		int j;
 
-		if (txr->tx_buf_ring == NULL)
+		if (!txr->tx_buf_ring)
 			continue;
 
 		for (j = 0; j < BNX2_TX_DESC_CNT; ) {
@@ -5448,7 +5448,7 @@ bnx2_free_tx_skbs(struct bnx2 *bp)
 			struct sk_buff *skb = tx_buf->skb;
 			int k, last;
 
-			if (skb == NULL) {
+			if (!skb) {
 				j = BNX2_NEXT_TX_BD(j);
 				continue;
 			}
@@ -5485,14 +5485,14 @@ bnx2_free_rx_skbs(struct bnx2 *bp)
 		struct bnx2_rx_ring_info *rxr = &bnapi->rx_ring;
 		int j;
 
-		if (rxr->rx_buf_ring == NULL)
+		if (!rxr->rx_buf_ring)
 			return;
 
 		for (j = 0; j < bp->rx_max_ring_idx; j++) {
 			struct bnx2_sw_bd *rx_buf = &rxr->rx_buf_ring[j];
 			u8 *data = rx_buf->data;
 
-			if (data == NULL)
+			if (!data)
 				continue;
 
 			dma_unmap_single(&bp->pdev->dev,
@@ -6826,7 +6826,7 @@ bnx2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *net_stats)
 {
 	struct bnx2 *bp = netdev_priv(dev);
 
-	if (bp->stats_blk == NULL)
+	if (!bp->stats_blk)
 		return;
 
 	net_stats->rx_packets =
@@ -7217,7 +7217,7 @@ bnx2_get_eeprom_len(struct net_device *dev)
 {
 	struct bnx2 *bp = netdev_priv(dev);
 
-	if (bp->flash_info == NULL)
+	if (!bp->flash_info)
 		return 0;
 
 	return (int) bp->flash_size;
@@ -7678,7 +7678,7 @@ bnx2_get_ethtool_stats(struct net_device *dev,
 	u32 *temp_stats = (u32 *) bp->temp_stats_blk;
 	u8 *stats_len_arr = NULL;
 
-	if (hw_stats == NULL) {
+	if (!hw_stats) {
 		memset(buf, 0, sizeof(u64) * BNX2_NUM_STATS);
 		return;
 	}
@@ -8121,7 +8121,7 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
 	bp->temp_stats_blk =
 		kzalloc(sizeof(struct statistics_block), GFP_KERNEL);
 
-	if (bp->temp_stats_blk == NULL) {
+	if (!bp->temp_stats_blk) {
 		rc = -ENOMEM;
 		goto err_out;
 	}
-- 
2.17.0

^ permalink raw reply related

* [PATCH 1/2] net: ethernet: bnx2: Remove extra parentheses
From: Varsha Rao @ 2018-06-03 11:49 UTC (permalink / raw)
  To: Rasesh Mody, Harish Patil, Dept-GELinuxNICDev, David S. Miller,
	netdev, linux-kernel, Nicholas Mc Guire, Lukas Bulwahn
  Cc: Varsha Rao
In-Reply-To: <cover.1528025568.git.rvarsha016@gmail.com>

The following coccinelle script removes extra parentheses to fix the
clang warning of extraneous parentheses.

@disable paren@
identifier i;
expression e;
statement s;
@@
if (
-(i == e)
+i == e
 )
s

Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
---
 drivers/net/ethernet/broadcom/bnx2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 9ffc4a8c5fc7..2306523778d4 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -3285,7 +3285,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 		sw_cons = BNX2_NEXT_RX_BD(sw_cons);
 		sw_prod = BNX2_NEXT_RX_BD(sw_prod);
 
-		if ((rx_pkt == budget))
+		if (rx_pkt == budget)
 			break;
 
 		/* Refresh hw_cons to see if there is new work */
-- 
2.17.0

^ permalink raw reply related

* [PATCH 0/2] net: bnx2: Fix checkpatch and clang warnings
From: Varsha Rao @ 2018-06-03 11:48 UTC (permalink / raw)
  To: Rasesh Mody, Harish Patil, Dept-GELinuxNICDev, David S. Miller,
	netdev, linux-kernel, Nicholas Mc Guire, Lukas Bulwahn
  Cc: Varsha Rao

This patchset fixes NULL comparison and extra parentheses, checkpatch
and clang warnings.

Varsha Rao (2):
  net: ethernet: bnx2: Remove extra parentheses
  net: ethernet: bnx2: Replace NULL comparison

 drivers/net/ethernet/broadcom/bnx2.c | 44 ++++++++++++++--------------
 1 file changed, 22 insertions(+), 22 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH] net: ipw2x00: Replace NULL comparison with !priv
From: Varsha Rao @ 2018-06-03 11:11 UTC (permalink / raw)
  To: Stanislav Yakovlev, Kalle Valo, David S. Miller, linux-wireless,
	netdev, linux-kernel, Nicholas Mc Guire, Lukas Bulwahn
  Cc: Varsha Rao

Remove extra parentheses and replace NULL comparison with !priv, to fix
clang warning of extraneous parentheses and check patch issue. Following
coccinelle script is used to fix it.

@disable is_null,paren@
expression e;
statement s;
@@
if (
- (e==NULL)
+!e
 )
s

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
---
 drivers/net/wireless/intel/ipw2x00/ipw2200.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.c b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
index 87a5e414c2f7..7d55bb09413b 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
@@ -7112,7 +7112,7 @@ static u32 ipw_qos_get_burst_duration(struct ipw_priv *priv)
 {
 	u32 ret = 0;
 
-	if ((priv == NULL))
+	if (!priv)
 		return 0;
 
 	if (!(priv->ieee->modulation & LIBIPW_OFDM_MODULATION))
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next] net: gemini: fix spelling mistake: "it" -> "is"
From: YueHaibing @ 2018-06-03  8:10 UTC (permalink / raw)
  To: davem, linus.walleij; +Cc: netdev, linux-kernel, YueHaibing

Trivial fix to spelling mistake in gemini dev_warn message

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/cortina/gemini.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index bd3f6e4..ff9eb45 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -539,7 +539,7 @@ static int gmac_setup_txqs(struct net_device *netdev)
 	}
 
 	if (port->txq_dma_base & ~DMA_Q_BASE_MASK) {
-		dev_warn(geth->dev, "TX queue base it not aligned\n");
+		dev_warn(geth->dev, "TX queue base is not aligned\n");
 		kfree(skb_tab);
 		return -ENOMEM;
 	}
@@ -680,7 +680,7 @@ static int gmac_setup_rxq(struct net_device *netdev)
 	if (!port->rxq_ring)
 		return -ENOMEM;
 	if (port->rxq_dma_base & ~NONTOE_QHDR0_BASE_MASK) {
-		dev_warn(geth->dev, "RX queue base it not aligned\n");
+		dev_warn(geth->dev, "RX queue base is not aligned\n");
 		return -ENOMEM;
 	}
 
@@ -905,7 +905,7 @@ static int geth_setup_freeq(struct gemini_ethernet *geth)
 	if (!geth->freeq_ring)
 		return -ENOMEM;
 	if (geth->freeq_dma_base & ~DMA_Q_BASE_MASK) {
-		dev_warn(geth->dev, "queue ring base it not aligned\n");
+		dev_warn(geth->dev, "queue ring base is not aligned\n");
 		goto err_freeq;
 	}
 
-- 
2.7.0

^ permalink raw reply related

* [PATCH net-next 1/3] bpf: implement bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03  7:36 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180603073654.3600598-1-yhs@fb.com>

bpf has been used extensively for tracing. For example, bcc
contains an almost full set of bpf-based tools to trace kernel
and user functions/events. Most tracing tools are currently
either filtered based on pid or system-wide.

Containers have been used quite extensively in industry and
cgroup is often used together to provide resource isolation
and protection. Several processes may run inside the same
container. It is often desirable to get container-level tracing
results as well, e.g. syscall count, function count, I/O
activity, etc.

This patch implements a new helper, bpf_get_current_cgroup_id(),
which will return cgroup id based on the cgroup within which
the current task is running.

The later patch will provide an example to show that
userspace can get the same cgroup id so it could
configure a filter or policy in the bpf program based on
task cgroup id.

The helper is currently implemented for tracing. It can
be added to other program types as well when needed.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      |  1 +
 include/uapi/linux/bpf.h |  9 ++++++++-
 kernel/bpf/core.c        |  1 +
 kernel/bpf/helpers.c     | 15 +++++++++++++++
 kernel/trace/bpf_trace.c |  2 ++
 5 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bbe2974..995c3b1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -746,6 +746,7 @@ extern const struct bpf_func_proto bpf_get_stackid_proto;
 extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
 extern const struct bpf_func_proto bpf_sock_hash_update_proto;
+extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 64ac0f7..1108936 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2054,6 +2054,12 @@ union bpf_attr {
  *
  *	Return
  *		0
+ *
+ * u64 bpf_get_current_cgroup_id(void)
+ * 	Return
+ * 		A 64-bit integer containing the current cgroup id based
+ * 		on the cgroup within which the current task is running.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2134,7 +2140,8 @@ union bpf_attr {
 	FN(lwt_seg6_adjust_srh),	\
 	FN(lwt_seg6_action),		\
 	FN(rc_repeat),			\
-	FN(rc_keydown),
+	FN(rc_keydown),			\
+	FN(get_current_cgroup_id),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 527587d..9f14937 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1765,6 +1765,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 const struct bpf_func_proto bpf_sock_map_update_proto __weak;
 const struct bpf_func_proto bpf_sock_hash_update_proto __weak;
+const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3d24e23..73065e2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -179,3 +179,18 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
 	.arg2_type	= ARG_CONST_SIZE,
 };
+
+#ifdef CONFIG_CGROUPS
+BPF_CALL_0(bpf_get_current_cgroup_id)
+{
+	struct cgroup *cgrp = task_dfl_cgroup(current);
+
+	return cgrp->kn->id.id;
+}
+
+const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
+	.func		= bpf_get_current_cgroup_id,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+};
+#endif
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index af1486d..6e4ade7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -564,6 +564,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_probe_read_str:
 		return &bpf_probe_read_str_proto;
+	case BPF_FUNC_get_current_cgroup_id:
+		return &bpf_get_current_cgroup_id_proto;
 	default:
 		return NULL;
 	}
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 2/3] tools/bpf: sync uapi bpf.h for bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03  7:36 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180603073654.3600598-1-yhs@fb.com>

Sync kernel uapi/linux/bpf.h with tools uapi/linux/bpf.h.
Also add the necessary helper define in bpf_helpers.h.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h            | 9 ++++++++-
 tools/testing/selftests/bpf/bpf_helpers.h | 2 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 64ac0f7..1108936 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2054,6 +2054,12 @@ union bpf_attr {
  *
  *	Return
  *		0
+ *
+ * u64 bpf_get_current_cgroup_id(void)
+ * 	Return
+ * 		A 64-bit integer containing the current cgroup id based
+ * 		on the cgroup within which the current task is running.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2134,7 +2140,8 @@ union bpf_attr {
 	FN(lwt_seg6_adjust_srh),	\
 	FN(lwt_seg6_action),		\
 	FN(rc_repeat),			\
-	FN(rc_keydown),
+	FN(rc_keydown),			\
+	FN(get_current_cgroup_id),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index a66a9d9..f2f28b6 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -131,6 +131,8 @@ static int (*bpf_rc_repeat)(void *ctx) =
 static int (*bpf_rc_keydown)(void *ctx, unsigned int protocol,
 			     unsigned long long scancode, unsigned int toggle) =
 	(void *) BPF_FUNC_rc_keydown;
+static unsigned long long (*bpf_get_current_cgroup_id)(void) =
+	(void *) BPF_FUNC_get_current_cgroup_id;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 0/3] bpf: implement bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03  7:36 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

bpf has been used extensively for tracing. For example, bcc
contains an almost full set of bpf-based tools to trace kernel
and user functions/events. Most tracing tools are currently
either filtered based on pid or system-wide.

Containers have been used quite extensively in industry and
cgroup is often used together to provide resource isolation
and protection. Several processes may run inside the same
container. It is often desirable to get container-level tracing
results as well, e.g. syscall count, function count, I/O
activity, etc.

This patch implements a new helper, bpf_get_current_cgroup_id(),
which will return cgroup id based on the cgroup within which
the current task is running.

Patch #1 implements the new helper in the kernel.
Patch #2 syncs the uapi bpf.h header and helper between tools
and kernel.
Patch #3 shows how to get the same cgroup id in user space,
so a filter or policy could be configgured in the bpf program
based on current task cgroup.

Yonghong Song (3):
  bpf: implement bpf_get_current_cgroup_id() helper
  tools/bpf: sync uapi bpf.h for bpf_get_current_cgroup_id() helper
  tools/bpf: add a selftest for bpf_get_current_cgroup_id() helper

 include/linux/bpf.h                              |   1 +
 include/uapi/linux/bpf.h                         |   9 +-
 kernel/bpf/core.c                                |   1 +
 kernel/bpf/helpers.c                             |  15 +++
 kernel/trace/bpf_trace.c                         |   2 +
 tools/include/uapi/linux/bpf.h                   |   9 +-
 tools/testing/selftests/bpf/.gitignore           |   1 +
 tools/testing/selftests/bpf/Makefile             |   6 +-
 tools/testing/selftests/bpf/bpf_helpers.h        |   2 +
 tools/testing/selftests/bpf/cgroup_helpers.c     |  57 +++++++++
 tools/testing/selftests/bpf/cgroup_helpers.h     |   1 +
 tools/testing/selftests/bpf/get_cgroup_id_kern.c |  28 +++++
 tools/testing/selftests/bpf/get_cgroup_id_user.c | 141 +++++++++++++++++++++++
 13 files changed, 269 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_kern.c
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_user.c

-- 
2.9.5

^ permalink raw reply

* [PATCH net-next 3/3] tools/bpf: add a selftest for bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03  7:36 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180603073654.3600598-1-yhs@fb.com>

Syscall name_to_handle_at() can be used to get cgroup id
for a particular cgroup path in user space. The selftest
got cgroup id from both user and kernel, and compare to
ensure they are equal to each other.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/.gitignore           |   1 +
 tools/testing/selftests/bpf/Makefile             |   6 +-
 tools/testing/selftests/bpf/cgroup_helpers.c     |  57 +++++++++
 tools/testing/selftests/bpf/cgroup_helpers.h     |   1 +
 tools/testing/selftests/bpf/get_cgroup_id_kern.c |  28 +++++
 tools/testing/selftests/bpf/get_cgroup_id_user.c | 141 +++++++++++++++++++++++
 6 files changed, 232 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_kern.c
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_user.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 6ea8359..49938d7 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -18,3 +18,4 @@ urandom_read
 test_btf
 test_sockmap
 test_lirc_mode2_user
+get_cgroup_id_user
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 553d181..607ed87 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -24,7 +24,7 @@ urandom_read: urandom_read.c
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
 	test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
-	test_sock test_btf test_sockmap test_lirc_mode2_user
+	test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
@@ -34,7 +34,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
 	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
 	test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
-	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o
+	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
+	get_cgroup_id_kern.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -63,6 +64,7 @@ $(OUTPUT)/test_sock: cgroup_helpers.c
 $(OUTPUT)/test_sock_addr: cgroup_helpers.c
 $(OUTPUT)/test_sockmap: cgroup_helpers.c
 $(OUTPUT)/test_progs: trace_helpers.c
+$(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
 
 .PHONY: force
 
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index f3bca3a..c87b4e0 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -6,6 +6,7 @@
 #include <sys/types.h>
 #include <linux/limits.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <linux/sched.h>
 #include <fcntl.h>
 #include <unistd.h>
@@ -176,3 +177,59 @@ int create_and_get_cgroup(char *path)
 
 	return fd;
 }
+
+/**
+ * get_cgroup_id() - Get cgroup id for a particular cgroup path
+ * @path: The cgroup path, relative to the workdir, to join
+ *
+ * On success, it returns the cgroup id. On failure it returns 0,
+ * which is an invalid cgroup id.
+ * If there is a failure, it prints the error to stderr.
+ */
+unsigned long long get_cgroup_id(char *path)
+{
+	int dirfd, err, flags, mount_id, fhsize;
+	union {
+		unsigned long long cgid;
+		unsigned char raw_bytes[8];
+	} id;
+	char cgroup_workdir[PATH_MAX + 1];
+	struct file_handle *fhp, *fhp2;
+	unsigned long long ret = 0;
+
+	format_cgroup_path(cgroup_workdir, path);
+
+	dirfd = AT_FDCWD;
+	flags = 0;
+	fhsize = sizeof(*fhp);
+	fhp = calloc(1, fhsize);
+	if (!fhp) {
+		log_err("calloc");
+		return 0;
+	}
+	err = name_to_handle_at(dirfd, cgroup_workdir, fhp, &mount_id, flags);
+	if (err >= 0 || fhp->handle_bytes != 8) {
+		log_err("name_to_handle_at");
+		goto free_mem;
+	}
+
+	fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
+	fhp2 = realloc(fhp, fhsize);
+	if (!fhp2) {
+		log_err("realloc");
+		goto free_mem;
+	}
+	err = name_to_handle_at(dirfd, cgroup_workdir, fhp2, &mount_id, flags);
+	fhp = fhp2;
+	if (err < 0) {
+		log_err("name_to_handle_at");
+		goto free_mem;
+	}
+
+	memcpy(id.raw_bytes, fhp->f_handle, 8);
+	ret = id.cgid;
+
+free_mem:
+	free(fhp);
+	return ret;
+}
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.h b/tools/testing/selftests/bpf/cgroup_helpers.h
index 06485e0..20a4a5d 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.h
+++ b/tools/testing/selftests/bpf/cgroup_helpers.h
@@ -13,5 +13,6 @@ int create_and_get_cgroup(char *path);
 int join_cgroup(char *path);
 int setup_cgroup_environment(void);
 void cleanup_cgroup_environment(void);
+unsigned long long get_cgroup_id(char *path);
 
 #endif
diff --git a/tools/testing/selftests/bpf/get_cgroup_id_kern.c b/tools/testing/selftests/bpf/get_cgroup_id_kern.c
new file mode 100644
index 0000000..2cf8cb2
--- /dev/null
+++ b/tools/testing/selftests/bpf/get_cgroup_id_kern.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") cg_ids = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64),
+	.max_entries = 1,
+};
+
+SEC("tracepoint/syscalls/sys_enter_nanosleep")
+int trace(void *ctx)
+{
+	__u32 key = 0;
+	__u64 *val;
+
+	val = bpf_map_lookup_elem(&cg_ids, &key);
+	if (val)
+		*val = bpf_get_current_cgroup_id();
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1; /* ignored by tracepoints, required by libbpf.a */
diff --git a/tools/testing/selftests/bpf/get_cgroup_id_user.c b/tools/testing/selftests/bpf/get_cgroup_id_user.c
new file mode 100644
index 0000000..ea19a42
--- /dev/null
+++ b/tools/testing/selftests/bpf/get_cgroup_id_user.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+#include "bpf_rlimit.h"
+
+#define CHECK(condition, tag, format...) ({		\
+	int __ret = !!(condition);			\
+	if (__ret) {					\
+		printf("%s:FAIL:%s ", __func__, tag);	\
+		printf(format);				\
+	} else {					\
+		printf("%s:PASS:%s\n", __func__, tag);	\
+	}						\
+	__ret;						\
+})
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+			const char *name)
+{
+	struct bpf_map *map;
+
+	map = bpf_object__find_map_by_name(obj, name);
+	if (!map)
+		return -1;
+	return bpf_map__fd(map);
+}
+
+#define TEST_CGROUP "/test-bpf-get-cgroup-id/"
+
+int main(int argc, char **argv)
+{
+	const char *probe_name = "syscalls/sys_enter_nanosleep";
+	const char *file = "get_cgroup_id_kern.o";
+	int err, bytes, efd, prog_fd, pmu_fd;
+	struct perf_event_attr attr = {};
+	int cgroup_fd, cgidmap_fd;
+	struct bpf_object *obj;
+	__u64 kcgid = 0, ucgid;
+	int exit_code = 1;
+	char buf[256];
+	__u32 key = 0;
+
+	err = setup_cgroup_environment();
+	if (CHECK(err, "setup_cgroup_environment", "err %d errno %d\n", err,
+		  errno))
+		return 1;
+
+	cgroup_fd = create_and_get_cgroup(TEST_CGROUP);
+	if (CHECK(cgroup_fd < 0, "create_and_get_cgroup", "err %d errno %d\n",
+		  cgroup_fd, errno))
+		goto cleanup_cgroup_env;
+
+	err = join_cgroup(TEST_CGROUP);
+	if (CHECK(err, "join_cgroup", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	cgidmap_fd = bpf_find_map(__func__, obj, "cg_ids");
+	if (CHECK(cgidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  cgidmap_fd, errno))
+		goto close_prog;
+
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+		  "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+
+	/* attach to this pid so the all bpf invocations will be in the
+	 * cgroup associated with this pid.
+	 */
+	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+		  errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	/* trigger some syscalls */
+	sleep(1);
+
+	err = bpf_map_lookup_elem(cgidmap_fd, &key, &kcgid);
+	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
+		goto close_pmu;
+
+	ucgid = get_cgroup_id(TEST_CGROUP);
+	if (CHECK(kcgid != ucgid, "compare_cgroup_id",
+		  "kern cgid %llx user cgid %llx", kcgid, ucgid))
+		goto close_pmu;
+
+	exit_code = 0;
+	printf("%s:PASS\n", argv[0]);
+
+close_pmu:
+	close(pmu_fd);
+close_prog:
+	bpf_object__close(obj);
+cleanup_cgroup_env:
+	cleanup_cgroup_environment();
+	return exit_code;
+}
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH] net: ethernet: mlx4: Remove unnecessary parentheses
From: Tariq Toukan @ 2018-06-03  7:15 UTC (permalink / raw)
  To: Varsha Rao, Tariq Toukan, David S. Miller, Nicholas Mc Guire,
	Lukas Bulwahn, netdev, linux-rdma, linux-kernel
In-Reply-To: <20180601020049.3704-1-rvarsha016@gmail.com>



On 01/06/2018 5:00 AM, Varsha Rao wrote:
> This patch fixes the clang warning of extraneous parentheses, with the
> following coccinelle script.
> 
> @@
> identifier i;
> expression e;
> statement s;
> @@
> if (
> -(i == e)
> +i == e
>   )
> s
> 
> Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/port.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
> index 3ef3406ff4cb..10fcc22f4590 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/port.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/port.c
> @@ -614,9 +614,9 @@ int __mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan,
>   		int index_at_dup_port = -1;
>   
>   		for (i = MLX4_VLAN_REGULAR; i < MLX4_MAX_VLAN_NUM; i++) {
> -			if ((vlan == (MLX4_VLAN_MASK & be32_to_cpu(table->entries[i]))))
> +			if (vlan == (MLX4_VLAN_MASK & be32_to_cpu(table->entries[i])))
>   				index_at_port = i;
> -			if ((vlan == (MLX4_VLAN_MASK & be32_to_cpu(dup_table->entries[i]))))
> +			if (vlan == (MLX4_VLAN_MASK & be32_to_cpu(dup_table->entries[i])))
>   				index_at_dup_port = i;
>   		}
>   		/* check that same vlan is not in the tables at different indices */
> 

Acked-by: Tariq Toukan <tariqt@mellanox.com>

Thanks for your patch,
Tariq

^ permalink raw reply

* [PATCH net-next V2 2/2] cls_flower: Fix comparing of old filter mask with new filter
From: Paul Blakey @ 2018-06-03  7:06 UTC (permalink / raw)
  To: Jiri Pirko, Cong Wang, Jamal Hadi Salim, David Miller, netdev
  Cc: Yevgeny Kliteynik, Roi Dayan, Shahar Klein, Mark Bloch,
	Or Gerlitz, Paul Blakey
In-Reply-To: <1528009574-63306-1-git-send-email-paulb@mellanox.com>

We incorrectly compare the mask and the result is that we can't modify
an already existing rule.

Fix that by comparing correctly.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
---

Changelog: v0 -> v2: rebased.

 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 159efd9..2b5be42 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -877,7 +877,7 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
 			return PTR_ERR(newmask);
 
 		fnew->mask = newmask;
-	} else if (fold && fold->mask == fnew->mask) {
+	} else if (fold && fold->mask != fnew->mask) {
 		return -EINVAL;
 	}
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 1/2] cls_flower: Fix missing free of rhashtable
From: Paul Blakey @ 2018-06-03  7:06 UTC (permalink / raw)
  To: Jiri Pirko, Cong Wang, Jamal Hadi Salim, David Miller, netdev
  Cc: Yevgeny Kliteynik, Roi Dayan, Shahar Klein, Mark Bloch,
	Or Gerlitz, Paul Blakey

When destroying the instance, destroy the head rhashtable.

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
---

Changelog: v0 -> v2: rebased.

 net/sched/cls_flower.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 3786fea..159efd9 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -326,6 +326,8 @@ static void fl_destroy_sleepable(struct work_struct *work)
 	struct cls_fl_head *head = container_of(to_rcu_work(work),
 						struct cls_fl_head,
 						rwork);
+
+	rhashtable_destroy(&head->ht);
 	kfree(head);
 	module_put(THIS_MODULE);
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps
From: Jesper Dangaard Brouer @ 2018-06-03  6:56 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: brouer, alexei.starovoitov, netdev
In-Reply-To: <20180602210641.6163-6-daniel@iogearbox.net>

On Sat,  2 Jun 2018 23:06:35 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> Before:
> 
>   # bpftool p d x i 1

Could this please be changed to:

 # bpftool prog dump xlated id 1

I requested this before, but you seem to have missed my feedback...
This makes the command "self-documenting" and searchable by Google.


>     0: (bf) r2 = r10
>     1: (07) r2 += -8
>     2: (7a) *(u64 *)(r2 +0) = 0
>     3: (18) r1 = map[id:1]
>     5: (85) call __htab_map_lookup_elem#232656
>     6: (15) if r0 == 0x0 goto pc+4
>     7: (71) r1 = *(u8 *)(r0 +35)
>     8: (55) if r1 != 0x0 goto pc+1
>     9: (72) *(u8 *)(r0 +35) = 1
>    10: (07) r0 += 56
>    11: (15) if r0 == 0x0 goto pc+4
>    12: (bf) r2 = r0
>    13: (18) r1 = map[id:1]
>    15: (85) call bpf_map_delete_elem#215008  <-- indirect call via
>    16: (95) exit                                 helper
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH] net: skbuff.h: drop unneeded <linux/slab.h>
From: Randy Dunlap @ 2018-06-03  4:40 UTC (permalink / raw)
  To: netdev@vger.kernel.org, David Miller; +Cc: LKML, Andrew Morton

From: Randy Dunlap <rdunlap@infradead.org>

<linux/skbuff.h> does not use nor need <linux/slab.h>, so drop this
header file from skbuff.h.

<linux/skbuff.h> is currently #included in around 1200 C source and
header files, making it the 31st most-used header file.

Build tested [allmodconfig] on 20 arch-es.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
 include/linux/skbuff.h |    2 --
 1 file changed, 2 deletions(-)

--- lnx.orig/include/linux/skbuff.h
+++ lnx.next/include/linux/skbuff.h
@@ -852,8 +852,6 @@ struct sk_buff {
 /*
  *	Handling routines are only of interest to the kernel
  */
-#include <linux/slab.h>
-
 
 #define SKB_ALLOC_FCLONE	0x01
 #define SKB_ALLOC_RX		0x02

^ permalink raw reply

* [PATCH net-next] net: chelsio: Use zeroing memory allocator instead of allocator/memset
From: YueHaibing @ 2018-06-03  2:40 UTC (permalink / raw)
  To: davem, santosh; +Cc: netdev, linux-kernel, ganeshgr, leedom, YueHaibing

Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/chelsio/cxgb3/sge.c   | 3 +--
 drivers/net/ethernet/chelsio/cxgb4/sge.c   | 3 +--
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 7 +------
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/sge.c b/drivers/net/ethernet/chelsio/cxgb3/sge.c
index e988caa..20b6e1b 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/sge.c
@@ -620,7 +620,7 @@ static void *alloc_ring(struct pci_dev *pdev, size_t nelem, size_t elem_size,
 {
 	size_t len = nelem * elem_size;
 	void *s = NULL;
-	void *p = dma_alloc_coherent(&pdev->dev, len, phys, GFP_KERNEL);
+	void *p = dma_zalloc_coherent(&pdev->dev, len, phys, GFP_KERNEL);
 
 	if (!p)
 		return NULL;
@@ -633,7 +633,6 @@ static void *alloc_ring(struct pci_dev *pdev, size_t nelem, size_t elem_size,
 		}
 		*(void **)metadata = s;
 	}
-	memset(p, 0, len);
 	return p;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 276f223..7a271fe 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -694,7 +694,7 @@ static void *alloc_ring(struct device *dev, size_t nelem, size_t elem_size,
 {
 	size_t len = nelem * elem_size + stat_size;
 	void *s = NULL;
-	void *p = dma_alloc_coherent(dev, len, phys, GFP_KERNEL);
+	void *p = dma_zalloc_coherent(dev, len, phys, GFP_KERNEL);
 
 	if (!p)
 		return NULL;
@@ -708,7 +708,6 @@ static void *alloc_ring(struct device *dev, size_t nelem, size_t elem_size,
 	}
 	if (metadata)
 		*(void **)metadata = s;
-	memset(p, 0, len);
 	return p;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index dfce5df..3007e1a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -756,7 +756,7 @@ static void *alloc_ring(struct device *dev, size_t nelem, size_t hwsize,
 	 * Allocate the hardware ring and PCI DMA bus address space for said.
 	 */
 	size_t hwlen = nelem * hwsize + stat_size;
-	void *hwring = dma_alloc_coherent(dev, hwlen, busaddrp, GFP_KERNEL);
+	void *hwring = dma_zalloc_coherent(dev, hwlen, busaddrp, GFP_KERNEL);
 
 	if (!hwring)
 		return NULL;
@@ -776,11 +776,6 @@ static void *alloc_ring(struct device *dev, size_t nelem, size_t hwsize,
 		*(void **)swringp = swring;
 	}
 
-	/*
-	 * Zero out the hardware ring and return its address as our function
-	 * value.
-	 */
-	memset(hwring, 0, hwlen);
 	return hwring;
 }
 
-- 
2.7.0

^ permalink raw reply related

* [PATCH net] rxrpc: Fix handling of call quietly cancelled out on server
From: David Howells @ 2018-06-03  1:17 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

Sometimes an in-progress call will stop responding on the fileserver when
the fileserver quietly cancels the call with an internally marked abort
(RX_CALL_DEAD), without sending an ABORT to the client.

This causes the client's call to eventually expire from lack of incoming
packets directed its way, which currently leads to it being cancelled
locally with ETIME.  Note that it's not currently clear as to why this
happens as it's really hard to reproduce.

The rotation policy implement by kAFS, however, doesn't differentiate
between ETIME meaning we didn't get any response from the server and ETIME
meaning the call got cancelled mid-flow.  The latter leads to an oops when
fetching data as the rotation partially resets the afs_read descriptor,
which can result in a cleared page pointer being dereferenced because that
page has already been filled.

Handle this by the following means:

 (1) Set a flag on a call when we receive a packet for it.

 (2) Store the highest packet serial number so far received for a call
     (bearing in mind this may wrap).

 (3) If, when the "not received anything recently" timeout expires on a
     call, we've received at least one packet for a call and the connection
     as a whole has received packets more recently than that call, then
     cancel the call locally with ECONNRESET rather than ETIME.

     This indicates that the call was definitely in progress on the server.

 (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME,
     don't try the next server, but rather abort the call.

     This avoids the oops as we don't try to reuse the afs_read struct.
     Rather, as-yet ungotten pages will be reread at a later data.

Also:

 (5) Add an rxrpc tracepoint to log detection of the call being reset.

Without this, I occasionally see an oops like the following:

    general protection fault: 0000 [#1] SMP PTI
    ...
    RIP: 0010:_copy_to_iter+0x204/0x310
    RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206
    RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560
    RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000
    RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400
    R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958
    R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560
    FS:  00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0
    Call Trace:
     skb_copy_datagram_iter+0x14e/0x289
     rxrpc_recvmsg_data.isra.0+0x6f3/0xf68
     ? trace_buffer_unlock_commit_regs+0x4f/0x89
     rxrpc_kernel_recv_data+0x149/0x421
     afs_extract_data+0x1e0/0x798
     ? afs_wait_for_call_to_complete+0xc9/0x52e
     afs_deliver_fs_fetch_data+0x33a/0x5ab
     afs_deliver_to_call+0x1ee/0x5e0
     ? afs_wait_for_call_to_complete+0xc9/0x52e
     afs_wait_for_call_to_complete+0x12b/0x52e
     ? wake_up_q+0x54/0x54
     afs_make_call+0x287/0x462
     ? afs_fs_fetch_data+0x3e6/0x3ed
     ? rcu_read_lock_sched_held+0x5d/0x63
     afs_fs_fetch_data+0x3e6/0x3ed
     afs_fetch_data+0xbb/0x14a
     afs_readpages+0x317/0x40d
     __do_page_cache_readahead+0x203/0x2ba
     ? ondemand_readahead+0x3a7/0x3c1
     ondemand_readahead+0x3a7/0x3c1
     generic_file_buffered_read+0x18b/0x62f
     __vfs_read+0xdb/0xfe
     vfs_read+0xb2/0x137
     ksys_read+0x50/0x8c
     do_syscall_64+0x7d/0x1a0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

Note the weird value in RDI which is a result of trying to kmap() a NULL
page pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/rotate.c              |    4 ++++
 include/trace/events/rxrpc.h |   32 ++++++++++++++++++++++++++++++++
 net/rxrpc/ar-internal.h      |    2 ++
 net/rxrpc/call_event.c       |    8 +++++++-
 net/rxrpc/input.c            |   10 ++++++++--
 5 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index e065bc0768e6..1faef56b12bd 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -310,6 +310,10 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
 	case -ETIME:
 		_debug("no conn");
 		goto iterate_address;
+
+	case -ECONNRESET:
+		_debug("call reset");
+		goto failed;
 	}
 
 restart_from_beginning:
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 077e664ac9a2..4fff00e9da8a 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -1459,6 +1459,38 @@ TRACE_EVENT(rxrpc_tx_fail,
 		      __print_symbolic(__entry->what, rxrpc_tx_fail_traces))
 	    );
 
+TRACE_EVENT(rxrpc_call_reset,
+	    TP_PROTO(struct rxrpc_call *call),
+
+	    TP_ARGS(call),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		debug_id	)
+		    __field(u32,			cid		)
+		    __field(u32,			call_id		)
+		    __field(rxrpc_serial_t,		call_serial	)
+		    __field(rxrpc_serial_t,		conn_serial	)
+		    __field(rxrpc_seq_t,		tx_seq		)
+		    __field(rxrpc_seq_t,		rx_seq		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->debug_id = call->debug_id;
+		    __entry->cid = call->cid;
+		    __entry->call_id = call->call_id;
+		    __entry->call_serial = call->rx_serial;
+		    __entry->conn_serial = call->conn->hi_serial;
+		    __entry->tx_seq = call->tx_hard_ack;
+		    __entry->rx_seq = call->ackr_seen;
+			   ),
+
+	    TP_printk("c=%08x %08x:%08x r=%08x/%08x tx=%08x rx=%08x",
+		      __entry->debug_id,
+		      __entry->cid, __entry->call_id,
+		      __entry->call_serial, __entry->conn_serial,
+		      __entry->tx_seq, __entry->rx_seq)
+	    );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 29923ec2189c..5fb7d3254d9e 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -477,6 +477,7 @@ enum rxrpc_call_flag {
 	RXRPC_CALL_PINGING,		/* Ping in process */
 	RXRPC_CALL_RETRANS_TIMEOUT,	/* Retransmission due to timeout occurred */
 	RXRPC_CALL_BEGAN_RX_TIMER,	/* We began the expect_rx_by timer */
+	RXRPC_CALL_RX_HEARD,		/* The peer responded at least once to this call */
 };
 
 /*
@@ -624,6 +625,7 @@ struct rxrpc_call {
 						 */
 	rxrpc_seq_t		rx_top;		/* Highest Rx slot allocated. */
 	rxrpc_seq_t		rx_expect_next;	/* Expected next packet sequence number */
+	rxrpc_serial_t		rx_serial;	/* Highest serial received for this call */
 	u8			rx_winsize;	/* Size of Rx window */
 	u8			tx_winsize;	/* Maximum size of Tx window */
 	bool			tx_phase;	/* T if transmission phase, F if receive phase */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 6e0d788b4dc4..20210418904b 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -392,7 +392,13 @@ void rxrpc_process_call(struct work_struct *work)
 
 	/* Process events */
 	if (test_and_clear_bit(RXRPC_CALL_EV_EXPIRED, &call->events)) {
-		rxrpc_abort_call("EXP", call, 0, RX_USER_ABORT, -ETIME);
+		if (test_bit(RXRPC_CALL_RX_HEARD, &call->flags) &&
+		    (int)call->conn->hi_serial - (int)call->rx_serial > 0) {
+			trace_rxrpc_call_reset(call);
+			rxrpc_abort_call("EXP", call, 0, RX_USER_ABORT, -ECONNRESET);
+		} else {
+			rxrpc_abort_call("EXP", call, 0, RX_USER_ABORT, -ETIME);
+		}
 		set_bit(RXRPC_CALL_EV_ABORT, &call->events);
 		goto recheck_state;
 	}
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index b5fd6381313d..608d078a4981 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1278,8 +1278,14 @@ void rxrpc_data_ready(struct sock *udp_sk)
 			call = NULL;
 		}
 
-		if (call && sp->hdr.serviceId != call->service_id)
-			call->service_id = sp->hdr.serviceId;
+		if (call) {
+			if (sp->hdr.serviceId != call->service_id)
+				call->service_id = sp->hdr.serviceId;
+			if ((int)sp->hdr.serial - (int)call->rx_serial > 0)
+				call->rx_serial = sp->hdr.serial;
+			if (!test_bit(RXRPC_CALL_RX_HEARD, &call->flags))
+				set_bit(RXRPC_CALL_RX_HEARD, &call->flags);
+		}
 	} else {
 		skew = 0;
 		call = NULL;

^ permalink raw reply related

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-06-03  0:49 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <2b3cabca-4710-0a71-69c7-cc433e2b3062@ti.com>

Hi Grygorii

> Don't know howto:
> 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST
> address = blocked MAC
> 2) add multicast MAC address with Supervisory Packet flag set. 
> Such packets will bypass most of checks inside ALE and will be forwarded in all port's
> states except "disabled".
> 3) add "unknown vlan configuration" : ALE provides possibility to configure
> default behavior for tagged packets with "unknown vlan" by configuring 
> - Unknown VLAN Force Untagged Egress ports Mask.
> - Unknown VLAN Registered Multicast Flood Ports Mask
> - Unknown VLAN Multicast Flood ports Mask
> - Unknown VLAN Member ports List
> 4) The way to detect "brctl stp br0 on/off"

You are probably looking at this from the wrong direction. Yes, the
switch can do these things. But the real question is, why would the
network stack want to do this? As i've said before, you are
accelerating the network stack by offloading things to the hardware.

Does the software bridge support FDB with a blocked flag? I don't
think it does. So you first need to extend the software bridge with
this concept. Then you can offload it to the hardware to accelerate
it.

Does the network stack need for forward specific multicast MAC
addresses between bridge ports independent of the state? If there is
no need for it, you don't need to accelerate it.

   Andrew

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-06-03  0:37 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <2b3cabca-4710-0a71-69c7-cc433e2b3062@ti.com>

> 1) boot, ping no vlan
> 
> # ip link add name br0 type bridge
> # echo 0 > /sys/class/net/br0/bridge/default_pvid
> # ip link set dev eth2 master br0
> # ip link set dev eth0 master br0
> # ip link set dev eth1 master br0
> # ifconfig br0 192.168.1.2
> 
> *Note*: I've had to disable default_pvid as otherwise linux Bridge adds
> and offloads default vlan 1, but default configuration for CPSW driver is vid 0.
> +  CPSW specific - it can't untag packets for P0.
> Another option I've found:
> # ip link set dev br0 type bridge vlan_filtering 1.
> but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0

There are three different configurations here you need to worry about,
with respect to vlans:

# CONFIG_VLAN_8021Q is not set

So you don't have any vlan support in the kernel.

CONFIG_VLAN_8021Q=y, vlan_filtering = 0

So you have vlans, but filtering is off

CONFIG_VLAN_8021Q=y, vlan_filtering = 1

So you have vlans, and filtering is on.

Even with vlan_filtering off, the bridge still does a little with
vlans.

And you need all three to work correctly. 

    Andrew

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-06-03  0:26 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <2b3cabca-4710-0a71-69c7-cc433e2b3062@ti.com>

> *After this patch set*: goal keep things working the same as max as
> possible and get rid of TI custom tool.

We are happy to keep things the same, if they fit with the switchdev
model. Anything in your customer TI tool/model which does not fit the
switchdev model you won't be able to keep, except if we agree to
extend the model.

I can say now, sw0p0 is going to cause problems. I really do suggest
you drop it for the moment in order to get a minimal driver
accepted. sw0p0 does not fit the switchdev model.

> Below I've described some tested use cases (not include full static configuration),
> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables
> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to
> offload MQPRIO and CBS parameters on all ports including P0. In case of P0,
> CPDMA TX channels shapers need to be configured, and in case 
> of sw0p1/sw0p2 internal FIFOS. 
> sw0p0 also expected to be used to configure CPDMA interface in general -
> number of tx/rx channels, rates, ring sizes.

Can this be derives from the configuration on sw0p1 and sw0p2? 
sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx
channels?

> In addition there is set of global CPSW parameters (not related to P1/P2, like
> MAC Authorization Mode, OUI Deny Mode, crc ) which I've 
> thought can be added to sw0p0 (using ethtool -priv-flags).

You should describe these features, and then we can figure out how
best to model them. devlink might be an option if they are switch
global.

     Andrew

^ permalink raw reply

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Andrew Lunn @ 2018-06-03  0:08 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ilias Apalodimas, Ivan Vecera, Jiri Pirko, netdev,
	ivan.khoronzhuk, nsekhar, francois.ozog, yogeshs, spatton
In-Reply-To: <2b3cabca-4710-0a71-69c7-cc433e2b3062@ti.com>

On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote:

Hi Grygorii

I'm just picking out one thing here... there is lots more good stuff here.

> Additional headache is PTP: we have on PHC, but both external interfaces P1/P2
> can timestamp packets.

This should not be a problem. The Marvell switches have one PHC, but
each port can time stamp packets using this counter. Each port has its
own receive and transmit time stamp registers. So i don't think this
will cause you problems.

     Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox