Netdev List
 help / color / mirror / Atom feed
* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Miller @ 2017-08-16 18:35 UTC (permalink / raw)
  To: john.fastabend; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <20170816.112819.1162151697604053267.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)

> From: John Fastabend <john.fastabend@gmail.com>
> Date: Tue, 15 Aug 2017 22:30:15 -0700
> 
>> This series implements a sockmap and socket redirect helper for BPF
>> using a model similar to XDP netdev redirect.
> 
> Series applied, thanks John.
> 

We get a legit warning from gcc due to these changes:

kernel/bpf/sockmap.c: In function ‘smap_state_change’:
kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  struct smap_psock *psock;

It's the default switch case in this function, psock is not initialized
for sure at this point.

^ permalink raw reply

* [net-next PATCH] net: rcu lock and preempt disable missing around generic xdp
From: John Fastabend @ 2017-08-16 18:33 UTC (permalink / raw)
  To: davem, daniel; +Cc: netdev, john.fastabend

do_xdp_generic must be called inside rcu critical section with preempt
disabled to ensure BPF programs are valid and per-cpu variables used
for redirect operations are consistent. This patch ensures this is true
and fixes the splat below.

The netif_receive_skb_internal() code path is now broken into two rcu
critical sections. I decided it was better to limit the preempt_enable/disable
block to just the xdp static key portion and the fallout is more
rcu_read_lock/unlock calls. Seems like the best option to me.

[  607.596901] =============================
[  607.596906] WARNING: suspicious RCU usage
[  607.596912] 4.13.0-rc4+ #570 Not tainted
[  607.596917] -----------------------------
[  607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage!
[  607.596927]
[  607.596927] other info that might help us debug this:
[  607.596927]
[  607.596933]
[  607.596933] rcu_scheduler_active = 2, debug_locks = 1
[  607.596938] 2 locks held by pool/14624:
[  607.596943]  #0:  (rcu_read_lock_bh){......}, at: [<ffffffff95445ffd>] ip_finish_output2+0x14d/0x890
[  607.596973]  #1:  (rcu_read_lock_bh){......}, at: [<ffffffff953c8e3a>] __dev_queue_xmit+0x14a/0xfd0
[  607.597000]
[  607.597000] stack backtrace:
[  607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570
[  607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 03/01/2017
[  607.597016] Call Trace:
[  607.597027]  dump_stack+0x67/0x92
[  607.597040]  lockdep_rcu_suspicious+0xdd/0x110
[  607.597054]  do_xdp_generic+0x313/0xa50
[  607.597068]  ? time_hardirqs_on+0x5b/0x150
[  607.597076]  ? mark_held_locks+0x6b/0xc0
[  607.597088]  ? netdev_pick_tx+0x150/0x150
[  607.597117]  netif_rx_internal+0x205/0x3f0
[  607.597127]  ? do_xdp_generic+0xa50/0xa50
[  607.597144]  ? lock_downgrade+0x2b0/0x2b0
[  607.597158]  ? __lock_is_held+0x93/0x100
[  607.597187]  netif_rx+0x119/0x190
[  607.597202]  loopback_xmit+0xfd/0x1b0
[  607.597214]  dev_hard_start_xmit+0x127/0x4e0

Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Fixes: b5cdae3291f7 ("net: Generic XDP")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/dev.c |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1024d37..47f4864 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3981,7 +3981,13 @@ static int netif_rx_internal(struct sk_buff *skb)
 	trace_netif_rx(skb);
 
 	if (static_key_false(&generic_xdp_needed)) {
-		int ret = do_xdp_generic(skb);
+		int ret;
+
+		preempt_disable();
+		rcu_read_lock();
+		ret = do_xdp_generic(skb);
+		rcu_read_unlock();
+		preempt_enable();
 
 		/* Consider XDP consuming the packet a success from
 		 * the netdev point of view we do not want to count
@@ -4499,17 +4505,21 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
 	if (skb_defer_rx_timestamp(skb))
 		return NET_RX_SUCCESS;
 
-	rcu_read_lock();
 
 	if (static_key_false(&generic_xdp_needed)) {
-		int ret = do_xdp_generic(skb);
+		int ret;
 
-		if (ret != XDP_PASS) {
-			rcu_read_unlock();
+		preempt_disable();
+		rcu_read_lock();
+		ret = do_xdp_generic(skb);
+		rcu_read_unlock();
+		preempt_enable();
+
+		if (ret != XDP_PASS)
 			return NET_RX_DROP;
-		}
 	}
 
+	rcu_read_lock();
 #ifdef CONFIG_RPS
 	if (static_key_false(&rps_needed)) {
 		struct rps_dev_flow voidflow, *rflow = &voidflow;

^ permalink raw reply related

* Re: DSA support for Micrel KSZ8895
From: Pavel Machek @ 2017-08-16 18:32 UTC (permalink / raw)
  To: Andrew Lunn, nathan.leigh.conrad
  Cc: vivien.didelot, f.fainelli, netdev, linux-kernel, Tristram.Ha,
	Woojung.Huh
In-Reply-To: <20170816140451.GA13006@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

Hi!

> > I've got hardware with KSZ8895, and I'd like to use switch ports as
> > separate ethernet cards. I believe that means DSA support.
> > 
> > And there are even patches available from microchip... unfortunately
> > they are in strange form and for v3.18.
> > 
> > http://www.microchip.com/SWLibraryWeb/product.aspx?product=KSZ8895%20Software%20Linux%203.18
> > 
> > Is there newer version of the driver available somewhere? Is the
> > driver good starting point, or should I start with something else?
> 
> Hi Pavel
> 
> Woojung is the expert here. His DSA driver for the 9477 is a nice
> clean driver.

Good.

> Have you compared the 8895 to the 9477. Are they similar? Could the
> existing 9477 be extended to support the 8895?

Register layouts are completely different, at the very least. I'm not
sure if functionality maps 1:1, that was not exactly easy to check.

I've ported the v3.18 driver to v4.4; it looks like v4.5 port will not
be easy. Plus question is if it works at all... time will tell.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Miller @ 2017-08-16 18:28 UTC (permalink / raw)
  To: john.fastabend; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <20170816052338.15445.83732.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Tue, 15 Aug 2017 22:30:15 -0700

> This series implements a sockmap and socket redirect helper for BPF
> using a model similar to XDP netdev redirect.

Series applied, thanks John.

^ permalink raw reply

* Re: [PATCH] net: 3c509: constify pnp_device_id
From: David Miller @ 2017-08-16 18:26 UTC (permalink / raw)
  To: arvind.yadav.cs; +Cc: jarod, dhowells, linux-kernel, netdev
In-Reply-To: <9970265bdf2dd4c4dfbaf6690145d9726748b598.1502859040.git.arvind.yadav.cs@gmail.com>

From: Arvind Yadav <arvind.yadav.cs@gmail.com>
Date: Wed, 16 Aug 2017 10:25:59 +0530

> pnp_device_id are not supposed to change at runtime. All functions
> working with pnp_device_id provided by <linux/pnp.h> work with
> const pnp_device_id. So mark the non-const structs as const.
> 
> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>

Applied.

^ permalink raw reply

* [PATCH net] ipv6: reset fn->rr_ptr when replacing route
From: Wei Wang @ 2017-08-16 18:18 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Eric Dumazet, Wei Wang

From: Wei Wang <weiwan@google.com>

syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
 ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
 ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
 [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
 [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
 [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
 [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
 [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
 [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
 [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
 [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
 [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
 [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
 [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
 [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
 [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
 [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
 [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
 [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
 [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
 [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
 [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
 [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
 [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
 [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
 [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384

The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6_fib.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 8c58c7558de0..78b9359b06fd 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1013,6 +1013,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		nsiblings = iter->rt6i_nsiblings;
 		iter->rt6i_node = NULL;
 		fib6_purge_rt(iter, fn, info->nl_net);
+		if (fn->rr_ptr == iter)
+			fn->rr_ptr = NULL;
 		rt6_release(iter);
 
 		if (nsiblings) {
@@ -1026,6 +1028,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 					*ins = iter->dst.rt6_next;
 					iter->rt6i_node = NULL;
 					fib6_purge_rt(iter, fn, info->nl_net);
+					if (fn->rr_ptr == iter)
+						fn->rr_ptr = NULL;
 					rt6_release(iter);
 					nsiblings--;
 				} else {
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* Re: [PATCH net 2/2] net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag
From: Alexander Duyck @ 2017-08-16 18:17 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: David Miller, Jeff Kirsher, Kees Cook,
	linux-kernel@vger.kernel.org, sparclinux, intel-wired-lan, Netdev
In-Reply-To: <1502876507-9360-3-git-send-email-dingtianhong@huawei.com>

On Wed, Aug 16, 2017 at 2:41 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
> The ixgbe driver use the compile check to determine if it can
> send TLPs to Root Port with the Relaxed Ordering Attribute set,
> this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
> has been added to the kernel and we could check the bit4 in the PCIe
> Davice Control register to determine whether we should use the Relaxed
> Ordering Attributes or not, so we add a new flag which called
> IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING to the ixgbe driver, it will
> be set if the Root Port couldn't deal the upstream TLPs with Relaxed
> Ordering Attribute, then the driver could know what to do next.
>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

I would say this is a good start but you probably don't even need the
IXGBE_FLAG2 bit. The hardware will honor the PCIe configuration space.
So I would just say you should probably drop the checks all together
and just strip the #ifdef CONFIG_SPARC bits. It should make the
resultant patch quite small and have the exact same effect.

> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h        |  1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 ++++++++++++-------------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++++++++++----------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 17 ++++++++++++
>  4 files changed, 53 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index dd55787..50e0553 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -621,6 +621,7 @@ struct ixgbe_adapter {
>  #define IXGBE_FLAG2_EEE_CAPABLE                        BIT(14)
>  #define IXGBE_FLAG2_EEE_ENABLED                        BIT(15)
>  #define IXGBE_FLAG2_RX_LEGACY                  BIT(16)
> +#define IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING   BIT(17)
>
>         /* Tx fast path data */
>         int num_tx_queues;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
> index 523f9d0..0727a30 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
> @@ -175,31 +175,30 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
>   **/
>  static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
>  {
> -#ifndef CONFIG_SPARC
> -       u32 regval;
> -       u32 i;
> -#endif
> +       u32 regval, i;
>         s32 ret_val;
> +       struct ixgbe_adapter *adapter = hw->back;
>
>         ret_val = ixgbe_start_hw_generic(hw);
>
> -#ifndef CONFIG_SPARC
> -       /* Disable relaxed ordering */
> -       for (i = 0; ((i < hw->mac.max_tx_queues) &&
> -            (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> -               regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
> -               regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> -               IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
> -       }
> +       if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
> +               /* Disable relaxed ordering */
> +               for (i = 0; ((i < hw->mac.max_tx_queues) &&
> +                    (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> +                       regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
> +                       regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> +                       IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
> +               }
>
> -       for (i = 0; ((i < hw->mac.max_rx_queues) &&
> -            (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> -               regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> -               regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> -                           IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> -               IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> +               for (i = 0; ((i < hw->mac.max_rx_queues) &&
> +                    (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> +                       regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> +                       regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> +                                   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> +                       IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> +               }
>         }
> -#endif
> +
>         if (ret_val)
>                 return ret_val;
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> index d4933d2..2473c0b 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> @@ -342,6 +342,7 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw)
>  s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
>  {
>         u32 i;
> +       struct ixgbe_adapter *adapter = hw->back;
>
>         /* Clear the rate limiters */
>         for (i = 0; i < hw->mac.max_tx_queues; i++) {
> @@ -350,25 +351,26 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
>         }
>         IXGBE_WRITE_FLUSH(hw);
>
> -#ifndef CONFIG_SPARC
> -       /* Disable relaxed ordering */
> -       for (i = 0; i < hw->mac.max_tx_queues; i++) {
> -               u32 regval;
> +       if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
> +               /* Disable relaxed ordering */
> +               for (i = 0; i < hw->mac.max_tx_queues; i++) {
> +                       u32 regval;
>
> -               regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
> -               regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> -               IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
> -       }
> +                       regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
> +                       regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> +                       IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
> +               }
>
> -       for (i = 0; i < hw->mac.max_rx_queues; i++) {
> -               u32 regval;
> +               for (i = 0; i < hw->mac.max_rx_queues; i++) {
> +                       u32 regval;
>
> -               regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> -               regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> -                           IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> -               IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> +                       regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> +                       regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> +                                   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> +                       IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> +               }
>         }
> -#endif
> +
>         return 0;
>  }
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index f1dbdf2..f576be7 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10081,6 +10081,23 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>                 goto err_ioremap;
>         }
>
> +       /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
> +        * Ingress Packet Data to Free List Buffers in order to allow for
> +        * chipset performance optimizations between the Root Complex and
> +        * Memory Controllers.  (Messages to the associated Ingress Queue
> +        * notifying new Packet Placement in the Free Lists Buffers will be
> +        * send without the Relaxed Ordering Attribute thus guaranteeing that
> +        * all preceding PCIe Transaction Layer Packets will be processed
> +        * first.)  But some Root Complexes have various issues with Upstream
> +        * Transaction Layer Packets with the Relaxed Ordering Attribute set.
> +        * The PCIe devices which under the Root Complexes will be cleared the
> +        * Relaxed Ordering bit in the configuration space, So we check our
> +        * PCIe configuration space to see if it's flagged with advice against
> +        * using Relaxed Ordering.
> +        */
> +       if (!pcie_relaxed_ordering_enabled(pdev))
> +               adapter->flags2 |= IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING;
> +
>         netdev->netdev_ops = &ixgbe_netdev_ops;
>         ixgbe_set_ethtool_ops(netdev);
>         netdev->watchdog_timeo = 5 * HZ;
> --
> 1.8.3.1
>
>

^ permalink raw reply

* [PATCH v3] sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
From: Alexander Potapenko @ 2017-08-16 18:16 UTC (permalink / raw)
  To: dvyukov, kcc, edumazet, marcelo.leitner, lucien.xin, vyasevich,
	davem, hideaki.yoshifuji
  Cc: linux-kernel, linux-sctp, netdev

KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
Make sure all fields of an IPv6 address are initialized, which
guarantees that the IPv4 fields are also initialized.

==================================================================
 BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
 net/sctp/ipv6.c:517
 CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
 01/01/2011
 Call Trace:
  dump_stack+0x172/0x1c0 lib/dump_stack.c:42
  is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
  kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
  native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
  arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
  arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
  __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
  sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
  sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
  sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
  sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
  sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
  inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
  sock_sendmsg_nosec net/socket.c:633 [inline]
  sock_sendmsg net/socket.c:643 [inline]
  SYSC_sendto+0x608/0x710 net/socket.c:1696
  SyS_sendto+0x8a/0xb0 net/socket.c:1664
  entry_SYSCALL_64_fastpath+0x13/0x94
 RIP: 0033:0x44b479
 RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
 RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
 RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
 R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
 R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
 origin description: ----dst_saddr@sctp_v6_get_dst
 local variable created at:
  sk_fullsock include/net/sock.h:2321 [inline]
  inet6_sk include/linux/ipv6.h:309 [inline]
  sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
  sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================
 BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
 net/sctp/ipv6.c:517
 CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
 01/01/2011
 Call Trace:
  dump_stack+0x172/0x1c0 lib/dump_stack.c:42
  is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
  kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
  native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
  arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
  arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
  __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
  sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
  sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
  sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
  sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
  sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
  inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
  sock_sendmsg_nosec net/socket.c:633 [inline]
  sock_sendmsg net/socket.c:643 [inline]
  SYSC_sendto+0x608/0x710 net/socket.c:1696
  SyS_sendto+0x8a/0xb0 net/socket.c:1664
  entry_SYSCALL_64_fastpath+0x13/0x94
 RIP: 0033:0x44b479
 RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
 RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
 RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
 R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
 R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
 origin description: ----dst_saddr@sctp_v6_get_dst
 local variable created at:
  sk_fullsock include/net/sock.h:2321 [inline]
  inet6_sk include/linux/ipv6.h:309 [inline]
  sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
  sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================

Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
v3: set flowinfo between port and addr to ease code review.
v2 is identical to v1, resending per request by Marcelo Ricardo Leitner.
---
 net/sctp/ipv6.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 2a186b201ad2..a4b6ffb61495 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -512,7 +512,9 @@ static void sctp_v6_to_addr(union sctp_addr *addr, struct in6_addr *saddr,
 {
 	addr->sa.sa_family = AF_INET6;
 	addr->v6.sin6_port = port;
+	addr->v6.sin6_flowinfo = 0;
 	addr->v6.sin6_addr = *saddr;
+	addr->v6.sin6_scope_id = 0;
 }
 
 /* Compare addresses exactly.
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* Re: [PATCH] net: igmp: Use ingress interface rather than vrf device
From: David Miller @ 2017-08-16 18:10 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, nikolay
In-Reply-To: <1502847522-13020-1-git-send-email-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Tue, 15 Aug 2017 18:38:42 -0700

> Anuradha reported that statically added groups for interfaces enslaved
> to a VRF device were not persisting. The problem is that igmp queries
> and reports need to use the data in the in_dev for the real ingress
> device rather than the VRF device. Update igmp_rcv accordingly.
> 
> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
> Reported-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* [PATCH net] ipv4: better IP_MAX_MTU enforcement
From: Eric Dumazet @ 2017-08-16 18:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

While working on yet another syzkaller report, I found
that our IP_MAX_MTU enforcements were not properly done.

gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
final result can be bigger than IP_MAX_MTU :/

This is a problem because device mtu can be changed on other cpus or
threads.

While this patch does not fix the issue I am working on, it is
probably worth addressing it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip.h |    4 ++--
 net/ipv4/route.c |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 821cedcc8e73..0cf7f5a65fe6 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -352,7 +352,7 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
 	    !forwarding)
 		return dst_mtu(dst);
 
-	return min(dst->dev->mtu, IP_MAX_MTU);
+	return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
 }
 
 static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
@@ -364,7 +364,7 @@ static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
 		return ip_dst_mtu_maybe_forward(skb_dst(skb), forwarding);
 	}
 
-	return min(skb_dst(skb)->dev->mtu, IP_MAX_MTU);
+	return min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
 }
 
 u32 ip_idents_reserve(u32 hash, int segs);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 7effa62beed3..fe877a4a72b1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1267,7 +1267,7 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
 	if (mtu)
 		return mtu;
 
-	mtu = dst->dev->mtu;
+	mtu = READ_ONCE(dst->dev->mtu);
 
 	if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
 		if (rt->rt_uses_gateway && mtu > 576)

^ permalink raw reply related

* Re: [PATCH net-next] liquidio: update VF's netdev->max_mtu if there's a change in PF's MTU
From: David Miller @ 2017-08-16 18:08 UTC (permalink / raw)
  To: felix.manlunas
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	veerasenareddy.burru
In-Reply-To: <20170815232622.GA2385@felix-thinkpad.cavium.com>

From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Tue, 15 Aug 2017 16:26:22 -0700

> From: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
> 
> A VF's MTU is capped at the parent PF's MTU.  So if there's a change in the
> PF's MTU, then update the VF's netdev->max_mtu.
> 
> Also remove duplicate log messages for MTU change.
> 
> Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
From: David Miller @ 2017-08-16 18:06 UTC (permalink / raw)
  To: decui
  Cc: netdev, gregkh, devel, kys, haiyangz, sthemmin, georgezhang,
	jhansen, mkubecek, asias, stefanha, vkuznets, cavery, jasowang,
	rolf.neugebauer, dave.scott, marcelo.cerri, apw, olaf, joe,
	linux-kernel, dan.carpenter
In-Reply-To: <KL1P15301MB00083C97259FF2A700C78442BF8D0@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Tue, 15 Aug 2017 22:13:29 +0000

> +	/*
> +	 * Check if we are running on VMware's hypervisor and bail out
> +	 * if we are not.
> +	 */
> +	if (x86_hyper != &x86_hyper_vmware)
> +		return -ENODEV;

This symbol is only available when CONFIG_HYPERVISOR_GUEST is defined.
But this driver does not have a Kconfig dependency on that symbol so
the build can fail in some configurations.

^ permalink raw reply

* Re: [PATCH net-next 0/4] various sizeof cleanups
From: David Miller @ 2017-08-16 18:02 UTC (permalink / raw)
  To: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ
  Cc: mlindner-eYqpPyKDWXRBDgjK7y7TUQ, mst-H+wXaHxf7aLQT0dZR+AlfA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sthemmin-0li6OtcxBFHby3iVrkZq2A
In-Reply-To: <20170815172919.26153-1-sthemmin-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>

From: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
Date: Tue, 15 Aug 2017 10:29:15 -0700

> Noticed some places that were using sizeof as an operator.
> This is legal C but is not the convention used in the kernel.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] net_sched/hfsc: opencode trivial set_active() and set_passive()
From: David Miller @ 2017-08-16 17:57 UTC (permalink / raw)
  To: khlebnikov; +Cc: netdev, xiyou.wangcong, jkosina, edumazet, jhs
In-Reply-To: <150280440342.717891.16424904669818880087.stgit@buzz>

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Tue, 15 Aug 2017 16:40:03 +0300

> Any move comment abount update_vf() into right place.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty
From: David Miller @ 2017-08-16 17:56 UTC (permalink / raw)
  To: khlebnikov; +Cc: netdev, xiyou.wangcong, jkosina, edumazet, jhs
In-Reply-To: <150280439968.717891.6476652519937001709.stgit@buzz>

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Tue, 15 Aug 2017 16:39:59 +0300

> This callback is used for deactivating class in parent qdisc.
> This is cheaper to test queue length right here.
> 
> Also this allows to catch draining screwed backlog and prevent
> second deactivation of already inactive parent class which will
> crash kernel for sure. Kernel with print warning at destruction
> of child qdisc where no packets but backlog is not zero.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering
From: David Miller @ 2017-08-16 17:56 UTC (permalink / raw)
  To: dingtianhong
  Cc: jeffrey.t.kirsher, keescook, linux-kernel, sparclinux,
	intel-wired-lan, alexander.duyck, netdev
In-Reply-To: <1502876507-9360-1-git-send-email-dingtianhong@huawei.com>

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Wed, 16 Aug 2017 17:41:45 +0800

> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
> to indicate that Relaxed Ordering Attributes (RO) should not
> be used for Transaction Layer Packets (TLP) targeted toward
> these affected Root Port, it will clear the bit4 in the PCIe
> Device Control register, so the PCIe device drivers could
> query PCIe configuration space to determine if it can send
> TLPs to Root Port with the Relaxed Ordering Attributes set.
> 
> The ixgbe driver could use this flag to determine if it can
> send TLPs to Root Port with the Relaxed Ordering Attributes set.

I'll let the Intel guys pick this up.

^ permalink raw reply

* [PATCH v2] i40e/i40evf: use cpumask_copy() for assigning cpumask
From: Juergen Gross @ 2017-08-16 17:52 UTC (permalink / raw)
  To: linux-kernel, netdev, intel-wired-lan
  Cc: jeffrey.t.kirsher, Juergen Gross, stable

Using direct assignment for a cpumask is wrong, cpumask_copy() should
be used instead.

Otherwise crashes like the following might happen:

[62792.326374] BUG: unable to handle kernel paging request at ffff8800049ff000
[62792.340118] IP: [<ffffffffa2043341>] i40e_irq_affinity_notify+0x11/0x20 [i40e]
...
[62792.810770] Call Trace:
[62792.815722]  [<ffffffff810d77a5>] irq_affinity_notify+0xb5/0xf0
[62792.827593]  [<ffffffff8109593e>] process_one_work+0x14e/0x410
[62792.839282]  [<ffffffff81096196>] worker_thread+0x116/0x490
[62792.850459]  [<ffffffff8109b667>] kthread+0xc7/0xe0
[62792.860255]  [<ffffffff816094bf>] ret_from_fork+0x3f/0x70
[62792.871996] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70

Fixes: 96db776a3682 ("i40e/i40evf: fix interrupt affinity bug")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2: enhance commit message, merge patches
---
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 2 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2db93d3f6d23..c0e42d162c7c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3495,7 +3495,7 @@ static void i40e_irq_affinity_notify(struct irq_affinity_notify *notify,
 	struct i40e_q_vector *q_vector =
 		container_of(notify, struct i40e_q_vector, affinity_notify);
 
-	q_vector->affinity_mask = *mask;
+	cpumask_copy(&q_vector->affinity_mask, mask);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 7c213a347909..a4b60367ecce 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -520,7 +520,7 @@ static void i40evf_irq_affinity_notify(struct irq_affinity_notify *notify,
 	struct i40e_q_vector *q_vector =
 		container_of(notify, struct i40e_q_vector, affinity_notify);
 
-	q_vector->affinity_mask = *mask;
+	cpumask_copy(&q_vector->affinity_mask, mask);
 }
 
 /**
-- 
2.12.3

^ permalink raw reply related

* Re: [patch net-next repost 2/3] net/sched: Change cls_flower to use IDR
From: David Miller @ 2017-08-16 17:50 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: chrism, netdev, jhs, jiri, mawilcox
In-Reply-To: <CAM_iQpXOKrT658X1ADfz=WmKryqUgzfOwUF6uTnPNG00EPOQdg@mail.gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 16 Aug 2017 10:42:01 -0700

> Also, _possibly_ we may make the tc filter handle generator
> generic?

Yes, this is a good idea.

^ permalink raw reply

* Stuck TX Using Iperf
From: Robert Jones @ 2017-08-16 17:46 UTC (permalink / raw)
  To: netdev, sgoutham, rric; +Cc: Tim Harvey

Hello Cavium Ethernet Driver Maintainers,

I'm working on a custom board using a Cavium OcteonTX CN80XX cpu
running a mainline 4.12.7 kernel and I've run into a problem where the
TX of my BGX0 configured for SGMII becomes stuck.

After boot I'm able to bring the interface up, run dhclient, and ping
a host machine with no issues. However, every time that I try to run
an iperf TCP test I get a stuck TX queue. Bringing the interface down
and then up does not resolve the problem, but physically reconnecting
the cable connected to the interface does. Also, after I reconnect the
cable I am no longer able to reproduce the stuck TX queue until
reboot. I receive no kernel driver messages of any sort during my
iperf test, it just stalls until I kill it.

Any help would be appreciated.

Regards,

Robert Jones - Software Engineer
Gateworks Corporation

^ permalink raw reply

* Re: [PATCH net-next 3/3 v4] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
From: Subash Abhinov Kasiviswanathan @ 2017-08-16 17:46 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, fengguang.wu, dcbw, stephen
In-Reply-To: <20170816113017.GJ1868@nanopsycho>

>> +struct rmnet_phys_ep_conf_s {
> 
> The name is cryptic. Why "_s"?
>> +
>> +struct rmnet_vnd_private_s {
> 
> Again, cryptic.
> 
> 
>> +	struct rmnet_logical_ep_conf_s local_ep;
>> +	u32 msg_enable;
>> +};

Hi Jiri

The _s was to indicate struct. I'll remove that.

>> +rx_handler_result_t rmnet_ingress_handler(struct sk_buff *skb)
>> +{
>> +	struct rmnet_phys_ep_conf_s *config;
> 
> I still fail to understand why the name of this is "config". Please
> change to something else across whole code. Including the name of the 
> struct.
> 
>> +	struct net_device *dev;
>> +	int rc;
>> +
>> +	if (!skb)
>> +		return RX_HANDLER_CONSUMED;
>> +
>> +	dev = skb->dev;
>> +	config = rmnet_get_phys_ep_config(skb->dev);
> 
> You have dev. Why not use dev?
> 
>> +rx_handler_result_t rmnet_rx_handler(struct sk_buff **pskb)
>> +{
>> +	return rmnet_ingress_handler(*pskb);
> 
> This is just silly. Why you don't have the content of 
> rmnet_ingress_handler
> right here?

I'll make these changes now.

^ permalink raw reply

* Re: [patch net-next repost 2/3] net/sched: Change cls_flower to use IDR
From: Cong Wang @ 2017-08-16 17:42 UTC (permalink / raw)
  To: Chris Mi
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, mawilcox
In-Reply-To: <1502871244-19870-3-git-send-email-chrism@mellanox.com>

On Wed, Aug 16, 2017 at 1:14 AM, Chris Mi <chrism@mellanox.com> wrote:
> Currently, all filters with the same priority are linked in a doubly
> linked list. Every filter should have a unique handle. To make the
> handle unique, we need to iterate the list every time to see if the
> handle exists or not when inserting a new filter. It is time-consuming.
> For example, it takes about 5m3.169s to insert 64K rules.
>
> This patch changes cls_flower to use IDR. With this patch, it
> takes about 0m1.127s to insert 64K rules. The improvement is huge.
>
> But please note that in this testing, all filters share the same action.
> If every filter has a unique action, that is another bottleneck.
> Follow-up patch in this patchset addresses that.
>
> Signed-off-by: Chris Mi <chrism@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

While you are on it, why only migrate cls_flower? I am sure
we have similar handle generators in other tc flters too.

Also, _possibly_ we may make the tc filter handle generator
generic?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 3/3 v4] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
From: Subash Abhinov Kasiviswanathan @ 2017-08-16 17:38 UTC (permalink / raw)
  To: David Miller, David Laight; +Cc: netdev, fengguang.wu, dcbw, jiri, stephen
In-Reply-To: <20170815.212414.71937112074428743.davem@davemloft.net>

>> > +static int rmnet_unregister_real_device(struct net_device *dev)
>> > +{
>> > +	int config_id = RMNET_LOCAL_LOGICAL_ENDPOINT;
>> > +	struct rmnet_logical_ep_conf_s *epconfig_l;
>> > +	struct rmnet_phys_ep_conf_s *config;
>> > +
>> > +	ASSERT_RTNL();
>> > +
>> > +	netdev_info(dev, "Removing device %s\n", dev->name);
>> > +
>> > +	if (!rmnet_is_real_dev_registered(dev))
>> > +		return -EINVAL;
>> > +
>> > +	for (; config_id < RMNET_MAX_LOGICAL_EP; config_id++) {
>> 
>> This loop is so much harder to understand because you initialize
>> the loop index several lines above the for() statement.  Just
>> initialize it here at the beginning of the for() loop and deal
>> with the fact that this will have to therefore be a multi-line
>> for() statement the best you can.
> ...
> 
> One way to split the multi-line for() is to put the initialiser
> on the immediately preceding line:
> 	config_id = RMNET_LOCAL_LOGICAL_ENDPOINT;
> 	for (; config_id < RMNET_MAX_LOGICAL_EP; config_id++) {
> 
> 	David

Hi DaveM / David

I'll move the initialization as above.

> 
>> +static int rmnet_set_egress_data_format(struct net_device *dev,	u32 
>> edf,
>> +					u16 agg_size, u16 agg_count)
> 
> Use a space instead of a TAB character before the "u32 edf," argument.
> 

I'll fix this.

>> +static int
>> +__rmnet_set_logical_endpoint_config(struct net_device *dev,
>> +				    int config_id,
>> +				    struct rmnet_logical_ep_conf_s *epconfig)
>> +{
>> +	struct rmnet_logical_ep_conf_s *epconfig_l;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	epconfig_l = rmnet_get_logical_ep(dev, config_id);
>> +
>> +	if (!epconfig_l || epconfig_l->refcount)
>> +		return -EINVAL;
>> +
>> +	memcpy(epconfig_l, epconfig, sizeof(struct 
>> rmnet_logical_ep_conf_s));
>> +	if (config_id == RMNET_LOCAL_LOGICAL_ENDPOINT)
>> +		epconfig_l->mux_id = 0;
>> +	else
>> +		epconfig_l->mux_id = config_id;
>> +
>> +	dev_hold(epconfig_l->egress_dev);
>> +	return 0;
>> +}
> 
> This looks really messy.
> 
> One invariant that must hold is that if I try to take down netdev
> "X", all resources holding a reference to X will be immediately
> (or quickly) dropped when that request comes in via notifiers.
> 
> Will this happen properly for this egress_dev reference counting?
> 
>> +static int rmnet_config_notify_cb(struct notifier_block *nb,
>> +				  unsigned long event, void *data)
>> +{
>> +	struct net_device *dev = netdev_notifier_info_to_dev(data);
>> +
>> +	if (!dev)
>> +		return NOTIFY_DONE;
>> +
>> +	switch (event) {
>> +	case NETDEV_UNREGISTER_FINAL:
>> +	case NETDEV_UNREGISTER:
>> +		netdev_info(dev, "Kernel unregister\n");
>> +		rmnet_force_unassociate_device(dev);
>> +		break;
> 
> So I guess it happens here?

Yes, all the rmnet devices are removed from the real_dev
in the notifier callback.

^ permalink raw reply

* [PATCH net] ptr_ring: use kmalloc_array()
From: Eric Dumazet @ 2017-08-16 17:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Michael S. Tsirkin, Jason Wang

From: Eric Dumazet <edumazet@google.com>

As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.

Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.

Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
---
 include/linux/ptr_ring.h  |    9 +++++----
 include/linux/skb_array.h |    3 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index d8c97ec8a8e6..37b4bb2545b3 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -436,9 +436,9 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
 	__PTR_RING_PEEK_CALL_v; \
 })
 
-static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
+static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
 {
-	return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
+	return kcalloc(size, sizeof(void *), gfp);
 }
 
 static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
@@ -582,7 +582,8 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
  * In particular if you consume ring in interrupt or BH context, you must
  * disable interrupts/BH when doing so.
  */
-static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
+static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
+					   unsigned int nrings,
 					   int size,
 					   gfp_t gfp, void (*destroy)(void *))
 {
@@ -590,7 +591,7 @@ static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
 	void ***queues;
 	int i;
 
-	queues = kmalloc(nrings * sizeof *queues, gfp);
+	queues = kmalloc_array(nrings, sizeof(*queues), gfp);
 	if (!queues)
 		goto noqueues;
 
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 35226cd4efb0..8621ffdeecbf 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -193,7 +193,8 @@ static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
 }
 
 static inline int skb_array_resize_multiple(struct skb_array **rings,
-					    int nrings, int size, gfp_t gfp)
+					    int nrings, unsigned int size,
+					    gfp_t gfp)
 {
 	BUILD_BUG_ON(offsetof(struct skb_array, ring));
 	return ptr_ring_resize_multiple((struct ptr_ring **)rings,

^ permalink raw reply related

* Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty
From: Konstantin Khlebnikov @ 2017-08-16 17:32 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David S. Miller, Jiri Kosina,
	Eric Dumazet, Jamal Hadi Salim
In-Reply-To: <CAM_iQpWmJsMWrqoA3h1eLkssaMXOc6kbsj75S9mdSokFKunumw@mail.gmail.com>



On 16.08.2017 20:22, Cong Wang wrote:
> On Tue, Aug 15, 2017 at 6:39 AM, Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru> wrote:
>> This callback is used for deactivating class in parent qdisc.
>> This is cheaper to test queue length right here.
>>
>> Also this allows to catch draining screwed backlog and prevent
>> second deactivation of already inactive parent class which will
>> crash kernel for sure. Kernel with print warning at destruction
>> of child qdisc where no packets but backlog is not zero.
> 
> Good cleanup. Please explicitly mark patches like this as
> for net-next.
> 
> Just one comment below.
> 
> 
>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>> index bd24a550e0f9..18da45c0769c 100644
>> --- a/net/sched/sch_api.c
>> +++ b/net/sched/sch_api.c
>> @@ -752,6 +752,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>>          const struct Qdisc_class_ops *cops;
>>          unsigned long cl;
>>          u32 parentid;
>> +       bool notify;
>>          int drops;
>>
>>          if (n == 0 && len == 0)
>> @@ -764,6 +765,13 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>>
>>                  if (sch->flags & TCQ_F_NOPARENT)
>>                          break;
>> +               /* Notify parent qdisc only if child qdisc becomes empty.
>> +                *
>> +                * If child was empty even before update then backlog
>> +                * counter is screwed and we skip notification because
>> +                * parent class is already passive.
>> +                */
>> +               notify = !sch->q.qlen && !WARN_ON_ONCE(!n);
> 
> Since 'n' never changes in this function, can we just move
> this WARN_ON_ONCE() right before the loop??
> 

Here warning evaluated only if qlen is zero.
'n' could be zero not together with non-zero 'len' and zero 'qlen'.
This means queue already was empty before reduce but caller somehow reduced it even further.

^ permalink raw reply

* Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty
From: Cong Wang @ 2017-08-16 17:22 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Linux Kernel Network Developers, David S. Miller, Jiri Kosina,
	Eric Dumazet, Jamal Hadi Salim
In-Reply-To: <150280439968.717891.6476652519937001709.stgit@buzz>

On Tue, Aug 15, 2017 at 6:39 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> This callback is used for deactivating class in parent qdisc.
> This is cheaper to test queue length right here.
>
> Also this allows to catch draining screwed backlog and prevent
> second deactivation of already inactive parent class which will
> crash kernel for sure. Kernel with print warning at destruction
> of child qdisc where no packets but backlog is not zero.

Good cleanup. Please explicitly mark patches like this as
for net-next.

Just one comment below.


> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index bd24a550e0f9..18da45c0769c 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -752,6 +752,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>         const struct Qdisc_class_ops *cops;
>         unsigned long cl;
>         u32 parentid;
> +       bool notify;
>         int drops;
>
>         if (n == 0 && len == 0)
> @@ -764,6 +765,13 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>
>                 if (sch->flags & TCQ_F_NOPARENT)
>                         break;
> +               /* Notify parent qdisc only if child qdisc becomes empty.
> +                *
> +                * If child was empty even before update then backlog
> +                * counter is screwed and we skip notification because
> +                * parent class is already passive.
> +                */
> +               notify = !sch->q.qlen && !WARN_ON_ONCE(!n);

Since 'n' never changes in this function, can we just move
this WARN_ON_ONCE() right before the loop??

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox