* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Miller @ 2017-08-16 18:35 UTC (permalink / raw)
To: john.fastabend; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <20170816.112819.1162151697604053267.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Tue, 15 Aug 2017 22:30:15 -0700
>
>> This series implements a sockmap and socket redirect helper for BPF
>> using a model similar to XDP netdev redirect.
>
> Series applied, thanks John.
>
We get a legit warning from gcc due to these changes:
kernel/bpf/sockmap.c: In function ‘smap_state_change’:
kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
struct smap_psock *psock;
It's the default switch case in this function, psock is not initialized
for sure at this point.
^ permalink raw reply
* [net-next PATCH] net: rcu lock and preempt disable missing around generic xdp
From: John Fastabend @ 2017-08-16 18:33 UTC (permalink / raw)
To: davem, daniel; +Cc: netdev, john.fastabend
do_xdp_generic must be called inside rcu critical section with preempt
disabled to ensure BPF programs are valid and per-cpu variables used
for redirect operations are consistent. This patch ensures this is true
and fixes the splat below.
The netif_receive_skb_internal() code path is now broken into two rcu
critical sections. I decided it was better to limit the preempt_enable/disable
block to just the xdp static key portion and the fallout is more
rcu_read_lock/unlock calls. Seems like the best option to me.
[ 607.596901] =============================
[ 607.596906] WARNING: suspicious RCU usage
[ 607.596912] 4.13.0-rc4+ #570 Not tainted
[ 607.596917] -----------------------------
[ 607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage!
[ 607.596927]
[ 607.596927] other info that might help us debug this:
[ 607.596927]
[ 607.596933]
[ 607.596933] rcu_scheduler_active = 2, debug_locks = 1
[ 607.596938] 2 locks held by pool/14624:
[ 607.596943] #0: (rcu_read_lock_bh){......}, at: [<ffffffff95445ffd>] ip_finish_output2+0x14d/0x890
[ 607.596973] #1: (rcu_read_lock_bh){......}, at: [<ffffffff953c8e3a>] __dev_queue_xmit+0x14a/0xfd0
[ 607.597000]
[ 607.597000] stack backtrace:
[ 607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570
[ 607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 03/01/2017
[ 607.597016] Call Trace:
[ 607.597027] dump_stack+0x67/0x92
[ 607.597040] lockdep_rcu_suspicious+0xdd/0x110
[ 607.597054] do_xdp_generic+0x313/0xa50
[ 607.597068] ? time_hardirqs_on+0x5b/0x150
[ 607.597076] ? mark_held_locks+0x6b/0xc0
[ 607.597088] ? netdev_pick_tx+0x150/0x150
[ 607.597117] netif_rx_internal+0x205/0x3f0
[ 607.597127] ? do_xdp_generic+0xa50/0xa50
[ 607.597144] ? lock_downgrade+0x2b0/0x2b0
[ 607.597158] ? __lock_is_held+0x93/0x100
[ 607.597187] netif_rx+0x119/0x190
[ 607.597202] loopback_xmit+0xfd/0x1b0
[ 607.597214] dev_hard_start_xmit+0x127/0x4e0
Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Fixes: b5cdae3291f7 ("net: Generic XDP")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/dev.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 1024d37..47f4864 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3981,7 +3981,13 @@ static int netif_rx_internal(struct sk_buff *skb)
trace_netif_rx(skb);
if (static_key_false(&generic_xdp_needed)) {
- int ret = do_xdp_generic(skb);
+ int ret;
+
+ preempt_disable();
+ rcu_read_lock();
+ ret = do_xdp_generic(skb);
+ rcu_read_unlock();
+ preempt_enable();
/* Consider XDP consuming the packet a success from
* the netdev point of view we do not want to count
@@ -4499,17 +4505,21 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
if (skb_defer_rx_timestamp(skb))
return NET_RX_SUCCESS;
- rcu_read_lock();
if (static_key_false(&generic_xdp_needed)) {
- int ret = do_xdp_generic(skb);
+ int ret;
- if (ret != XDP_PASS) {
- rcu_read_unlock();
+ preempt_disable();
+ rcu_read_lock();
+ ret = do_xdp_generic(skb);
+ rcu_read_unlock();
+ preempt_enable();
+
+ if (ret != XDP_PASS)
return NET_RX_DROP;
- }
}
+ rcu_read_lock();
#ifdef CONFIG_RPS
if (static_key_false(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
^ permalink raw reply related
* Re: DSA support for Micrel KSZ8895
From: Pavel Machek @ 2017-08-16 18:32 UTC (permalink / raw)
To: Andrew Lunn, nathan.leigh.conrad
Cc: vivien.didelot, f.fainelli, netdev, linux-kernel, Tristram.Ha,
Woojung.Huh
In-Reply-To: <20170816140451.GA13006@lunn.ch>
[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]
Hi!
> > I've got hardware with KSZ8895, and I'd like to use switch ports as
> > separate ethernet cards. I believe that means DSA support.
> >
> > And there are even patches available from microchip... unfortunately
> > they are in strange form and for v3.18.
> >
> > http://www.microchip.com/SWLibraryWeb/product.aspx?product=KSZ8895%20Software%20Linux%203.18
> >
> > Is there newer version of the driver available somewhere? Is the
> > driver good starting point, or should I start with something else?
>
> Hi Pavel
>
> Woojung is the expert here. His DSA driver for the 9477 is a nice
> clean driver.
Good.
> Have you compared the 8895 to the 9477. Are they similar? Could the
> existing 9477 be extended to support the 8895?
Register layouts are completely different, at the very least. I'm not
sure if functionality maps 1:1, that was not exactly easy to check.
I've ported the v3.18 driver to v4.4; it looks like v4.5 port will not
be easy. Plus question is if it works at all... time will tell.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Miller @ 2017-08-16 18:28 UTC (permalink / raw)
To: john.fastabend; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <20170816052338.15445.83732.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Tue, 15 Aug 2017 22:30:15 -0700
> This series implements a sockmap and socket redirect helper for BPF
> using a model similar to XDP netdev redirect.
Series applied, thanks John.
^ permalink raw reply
* Re: [PATCH] net: 3c509: constify pnp_device_id
From: David Miller @ 2017-08-16 18:26 UTC (permalink / raw)
To: arvind.yadav.cs; +Cc: jarod, dhowells, linux-kernel, netdev
In-Reply-To: <9970265bdf2dd4c4dfbaf6690145d9726748b598.1502859040.git.arvind.yadav.cs@gmail.com>
From: Arvind Yadav <arvind.yadav.cs@gmail.com>
Date: Wed, 16 Aug 2017 10:25:59 +0530
> pnp_device_id are not supposed to change at runtime. All functions
> working with pnp_device_id provided by <linux/pnp.h> work with
> const pnp_device_id. So mark the non-const structs as const.
>
> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Applied.
^ permalink raw reply
* [PATCH net] ipv6: reset fn->rr_ptr when replacing route
From: Wei Wang @ 2017-08-16 18:18 UTC (permalink / raw)
To: David Miller, netdev; +Cc: Eric Dumazet, Wei Wang
From: Wei Wang <weiwan@google.com>
syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
[<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
[<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
[<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
[<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
[<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
[<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
[<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
[<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
[<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
[<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
[<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
[<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
[<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
[<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
[<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
[<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
[<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
[<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
[<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
[<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
[<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
[<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().
Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
net/ipv6/ip6_fib.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 8c58c7558de0..78b9359b06fd 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1013,6 +1013,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
nsiblings = iter->rt6i_nsiblings;
iter->rt6i_node = NULL;
fib6_purge_rt(iter, fn, info->nl_net);
+ if (fn->rr_ptr == iter)
+ fn->rr_ptr = NULL;
rt6_release(iter);
if (nsiblings) {
@@ -1026,6 +1028,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
*ins = iter->dst.rt6_next;
iter->rt6i_node = NULL;
fib6_purge_rt(iter, fn, info->nl_net);
+ if (fn->rr_ptr == iter)
+ fn->rr_ptr = NULL;
rt6_release(iter);
nsiblings--;
} else {
--
2.14.1.480.gb18f417b89-goog
^ permalink raw reply related
* Re: [PATCH net 2/2] net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag
From: Alexander Duyck @ 2017-08-16 18:17 UTC (permalink / raw)
To: Ding Tianhong
Cc: David Miller, Jeff Kirsher, Kees Cook,
linux-kernel@vger.kernel.org, sparclinux, intel-wired-lan, Netdev
In-Reply-To: <1502876507-9360-3-git-send-email-dingtianhong@huawei.com>
On Wed, Aug 16, 2017 at 2:41 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
> The ixgbe driver use the compile check to determine if it can
> send TLPs to Root Port with the Relaxed Ordering Attribute set,
> this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
> has been added to the kernel and we could check the bit4 in the PCIe
> Davice Control register to determine whether we should use the Relaxed
> Ordering Attributes or not, so we add a new flag which called
> IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING to the ixgbe driver, it will
> be set if the Root Port couldn't deal the upstream TLPs with Relaxed
> Ordering Attribute, then the driver could know what to do next.
>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
I would say this is a good start but you probably don't even need the
IXGBE_FLAG2 bit. The hardware will honor the PCIe configuration space.
So I would just say you should probably drop the checks all together
and just strip the #ifdef CONFIG_SPARC bits. It should make the
resultant patch quite small and have the exact same effect.
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 +
> drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c | 37 ++++++++++++-------------
> drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++++++++++----------
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 17 ++++++++++++
> 4 files changed, 53 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index dd55787..50e0553 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -621,6 +621,7 @@ struct ixgbe_adapter {
> #define IXGBE_FLAG2_EEE_CAPABLE BIT(14)
> #define IXGBE_FLAG2_EEE_ENABLED BIT(15)
> #define IXGBE_FLAG2_RX_LEGACY BIT(16)
> +#define IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING BIT(17)
>
> /* Tx fast path data */
> int num_tx_queues;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
> index 523f9d0..0727a30 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
> @@ -175,31 +175,30 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
> **/
> static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
> {
> -#ifndef CONFIG_SPARC
> - u32 regval;
> - u32 i;
> -#endif
> + u32 regval, i;
> s32 ret_val;
> + struct ixgbe_adapter *adapter = hw->back;
>
> ret_val = ixgbe_start_hw_generic(hw);
>
> -#ifndef CONFIG_SPARC
> - /* Disable relaxed ordering */
> - for (i = 0; ((i < hw->mac.max_tx_queues) &&
> - (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> - regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
> - regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> - IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
> - }
> + if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
> + /* Disable relaxed ordering */
> + for (i = 0; ((i < hw->mac.max_tx_queues) &&
> + (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> + regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
> + regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> + IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
> + }
>
> - for (i = 0; ((i < hw->mac.max_rx_queues) &&
> - (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> - regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> - regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> - IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> - IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> + for (i = 0; ((i < hw->mac.max_rx_queues) &&
> + (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
> + regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> + regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> + IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> + IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> + }
> }
> -#endif
> +
> if (ret_val)
> return ret_val;
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> index d4933d2..2473c0b 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> @@ -342,6 +342,7 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw)
> s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
> {
> u32 i;
> + struct ixgbe_adapter *adapter = hw->back;
>
> /* Clear the rate limiters */
> for (i = 0; i < hw->mac.max_tx_queues; i++) {
> @@ -350,25 +351,26 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
> }
> IXGBE_WRITE_FLUSH(hw);
>
> -#ifndef CONFIG_SPARC
> - /* Disable relaxed ordering */
> - for (i = 0; i < hw->mac.max_tx_queues; i++) {
> - u32 regval;
> + if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
> + /* Disable relaxed ordering */
> + for (i = 0; i < hw->mac.max_tx_queues; i++) {
> + u32 regval;
>
> - regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
> - regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> - IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
> - }
> + regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
> + regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
> + IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
> + }
>
> - for (i = 0; i < hw->mac.max_rx_queues; i++) {
> - u32 regval;
> + for (i = 0; i < hw->mac.max_rx_queues; i++) {
> + u32 regval;
>
> - regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> - regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> - IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> - IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> + regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
> + regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
> + IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
> + IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
> + }
> }
> -#endif
> +
> return 0;
> }
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index f1dbdf2..f576be7 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -10081,6 +10081,23 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> goto err_ioremap;
> }
>
> + /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
> + * Ingress Packet Data to Free List Buffers in order to allow for
> + * chipset performance optimizations between the Root Complex and
> + * Memory Controllers. (Messages to the associated Ingress Queue
> + * notifying new Packet Placement in the Free Lists Buffers will be
> + * send without the Relaxed Ordering Attribute thus guaranteeing that
> + * all preceding PCIe Transaction Layer Packets will be processed
> + * first.) But some Root Complexes have various issues with Upstream
> + * Transaction Layer Packets with the Relaxed Ordering Attribute set.
> + * The PCIe devices which under the Root Complexes will be cleared the
> + * Relaxed Ordering bit in the configuration space, So we check our
> + * PCIe configuration space to see if it's flagged with advice against
> + * using Relaxed Ordering.
> + */
> + if (!pcie_relaxed_ordering_enabled(pdev))
> + adapter->flags2 |= IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING;
> +
> netdev->netdev_ops = &ixgbe_netdev_ops;
> ixgbe_set_ethtool_ops(netdev);
> netdev->watchdog_timeo = 5 * HZ;
> --
> 1.8.3.1
>
>
^ permalink raw reply
* [PATCH v3] sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
From: Alexander Potapenko @ 2017-08-16 18:16 UTC (permalink / raw)
To: dvyukov, kcc, edumazet, marcelo.leitner, lucien.xin, vyasevich,
davem, hideaki.yoshifuji
Cc: linux-kernel, linux-sctp, netdev
KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
Make sure all fields of an IPv6 address are initialized, which
guarantees that the IPv4 fields are also initialized.
==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
net/sctp/ipv6.c:517
CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
dump_stack+0x172/0x1c0 lib/dump_stack.c:42
is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x44b479
RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
origin description: ----dst_saddr@sctp_v6_get_dst
local variable created at:
sk_fullsock include/net/sock.h:2321 [inline]
inet6_sk include/linux/ipv6.h:309 [inline]
sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
net/sctp/ipv6.c:517
CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
dump_stack+0x172/0x1c0 lib/dump_stack.c:42
is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x44b479
RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
origin description: ----dst_saddr@sctp_v6_get_dst
local variable created at:
sk_fullsock include/net/sock.h:2321 [inline]
inet6_sk include/linux/ipv6.h:309 [inline]
sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
v3: set flowinfo between port and addr to ease code review.
v2 is identical to v1, resending per request by Marcelo Ricardo Leitner.
---
net/sctp/ipv6.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 2a186b201ad2..a4b6ffb61495 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -512,7 +512,9 @@ static void sctp_v6_to_addr(union sctp_addr *addr, struct in6_addr *saddr,
{
addr->sa.sa_family = AF_INET6;
addr->v6.sin6_port = port;
+ addr->v6.sin6_flowinfo = 0;
addr->v6.sin6_addr = *saddr;
+ addr->v6.sin6_scope_id = 0;
}
/* Compare addresses exactly.
--
2.14.1.480.gb18f417b89-goog
^ permalink raw reply related
* Re: [PATCH] net: igmp: Use ingress interface rather than vrf device
From: David Miller @ 2017-08-16 18:10 UTC (permalink / raw)
To: dsahern; +Cc: netdev, nikolay
In-Reply-To: <1502847522-13020-1-git-send-email-dsahern@gmail.com>
From: David Ahern <dsahern@gmail.com>
Date: Tue, 15 Aug 2017 18:38:42 -0700
> Anuradha reported that statically added groups for interfaces enslaved
> to a VRF device were not persisting. The problem is that igmp queries
> and reports need to use the data in the in_dev for the real ingress
> device rather than the VRF device. Update igmp_rcv accordingly.
>
> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
> Reported-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
> Signed-off-by: David Ahern <dsahern@gmail.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* [PATCH net] ipv4: better IP_MAX_MTU enforcement
From: Eric Dumazet @ 2017-08-16 18:09 UTC (permalink / raw)
To: David Miller; +Cc: netdev
From: Eric Dumazet <edumazet@google.com>
While working on yet another syzkaller report, I found
that our IP_MAX_MTU enforcements were not properly done.
gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
final result can be bigger than IP_MAX_MTU :/
This is a problem because device mtu can be changed on other cpus or
threads.
While this patch does not fix the issue I am working on, it is
probably worth addressing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/ip.h | 4 ++--
net/ipv4/route.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index 821cedcc8e73..0cf7f5a65fe6 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -352,7 +352,7 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
!forwarding)
return dst_mtu(dst);
- return min(dst->dev->mtu, IP_MAX_MTU);
+ return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
}
static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
@@ -364,7 +364,7 @@ static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
return ip_dst_mtu_maybe_forward(skb_dst(skb), forwarding);
}
- return min(skb_dst(skb)->dev->mtu, IP_MAX_MTU);
+ return min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
}
u32 ip_idents_reserve(u32 hash, int segs);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 7effa62beed3..fe877a4a72b1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1267,7 +1267,7 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
if (mtu)
return mtu;
- mtu = dst->dev->mtu;
+ mtu = READ_ONCE(dst->dev->mtu);
if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
if (rt->rt_uses_gateway && mtu > 576)
^ permalink raw reply related
* Re: [PATCH net-next] liquidio: update VF's netdev->max_mtu if there's a change in PF's MTU
From: David Miller @ 2017-08-16 18:08 UTC (permalink / raw)
To: felix.manlunas
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
veerasenareddy.burru
In-Reply-To: <20170815232622.GA2385@felix-thinkpad.cavium.com>
From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Tue, 15 Aug 2017 16:26:22 -0700
> From: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
>
> A VF's MTU is capped at the parent PF's MTU. So if there's a change in the
> PF's MTU, then update the VF's netdev->max_mtu.
>
> Also remove duplicate log messages for MTU change.
>
> Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
From: David Miller @ 2017-08-16 18:06 UTC (permalink / raw)
To: decui
Cc: netdev, gregkh, devel, kys, haiyangz, sthemmin, georgezhang,
jhansen, mkubecek, asias, stefanha, vkuznets, cavery, jasowang,
rolf.neugebauer, dave.scott, marcelo.cerri, apw, olaf, joe,
linux-kernel, dan.carpenter
In-Reply-To: <KL1P15301MB00083C97259FF2A700C78442BF8D0@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>
From: Dexuan Cui <decui@microsoft.com>
Date: Tue, 15 Aug 2017 22:13:29 +0000
> + /*
> + * Check if we are running on VMware's hypervisor and bail out
> + * if we are not.
> + */
> + if (x86_hyper != &x86_hyper_vmware)
> + return -ENODEV;
This symbol is only available when CONFIG_HYPERVISOR_GUEST is defined.
But this driver does not have a Kconfig dependency on that symbol so
the build can fail in some configurations.
^ permalink raw reply
* Re: [PATCH net-next 0/4] various sizeof cleanups
From: David Miller @ 2017-08-16 18:02 UTC (permalink / raw)
To: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ
Cc: mlindner-eYqpPyKDWXRBDgjK7y7TUQ, mst-H+wXaHxf7aLQT0dZR+AlfA,
jasowang-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
sthemmin-0li6OtcxBFHby3iVrkZq2A
In-Reply-To: <20170815172919.26153-1-sthemmin-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>
From: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
Date: Tue, 15 Aug 2017 10:29:15 -0700
> Noticed some places that were using sizeof as an operator.
> This is legal C but is not the convention used in the kernel.
Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 2/2] net_sched/hfsc: opencode trivial set_active() and set_passive()
From: David Miller @ 2017-08-16 17:57 UTC (permalink / raw)
To: khlebnikov; +Cc: netdev, xiyou.wangcong, jkosina, edumazet, jhs
In-Reply-To: <150280440342.717891.16424904669818880087.stgit@buzz>
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Tue, 15 Aug 2017 16:40:03 +0300
> Any move comment abount update_vf() into right place.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Applied.
^ permalink raw reply
* Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty
From: David Miller @ 2017-08-16 17:56 UTC (permalink / raw)
To: khlebnikov; +Cc: netdev, xiyou.wangcong, jkosina, edumazet, jhs
In-Reply-To: <150280439968.717891.6476652519937001709.stgit@buzz>
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Tue, 15 Aug 2017 16:39:59 +0300
> This callback is used for deactivating class in parent qdisc.
> This is cheaper to test queue length right here.
>
> Also this allows to catch draining screwed backlog and prevent
> second deactivation of already inactive parent class which will
> crash kernel for sure. Kernel with print warning at destruction
> of child qdisc where no packets but backlog is not zero.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Applied.
^ permalink raw reply
* Re: [PATCH net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering
From: David Miller @ 2017-08-16 17:56 UTC (permalink / raw)
To: dingtianhong
Cc: jeffrey.t.kirsher, keescook, linux-kernel, sparclinux,
intel-wired-lan, alexander.duyck, netdev
In-Reply-To: <1502876507-9360-1-git-send-email-dingtianhong@huawei.com>
From: Ding Tianhong <dingtianhong@huawei.com>
Date: Wed, 16 Aug 2017 17:41:45 +0800
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
> to indicate that Relaxed Ordering Attributes (RO) should not
> be used for Transaction Layer Packets (TLP) targeted toward
> these affected Root Port, it will clear the bit4 in the PCIe
> Device Control register, so the PCIe device drivers could
> query PCIe configuration space to determine if it can send
> TLPs to Root Port with the Relaxed Ordering Attributes set.
>
> The ixgbe driver could use this flag to determine if it can
> send TLPs to Root Port with the Relaxed Ordering Attributes set.
I'll let the Intel guys pick this up.
^ permalink raw reply
* [PATCH v2] i40e/i40evf: use cpumask_copy() for assigning cpumask
From: Juergen Gross @ 2017-08-16 17:52 UTC (permalink / raw)
To: linux-kernel, netdev, intel-wired-lan
Cc: jeffrey.t.kirsher, Juergen Gross, stable
Using direct assignment for a cpumask is wrong, cpumask_copy() should
be used instead.
Otherwise crashes like the following might happen:
[62792.326374] BUG: unable to handle kernel paging request at ffff8800049ff000
[62792.340118] IP: [<ffffffffa2043341>] i40e_irq_affinity_notify+0x11/0x20 [i40e]
...
[62792.810770] Call Trace:
[62792.815722] [<ffffffff810d77a5>] irq_affinity_notify+0xb5/0xf0
[62792.827593] [<ffffffff8109593e>] process_one_work+0x14e/0x410
[62792.839282] [<ffffffff81096196>] worker_thread+0x116/0x490
[62792.850459] [<ffffffff8109b667>] kthread+0xc7/0xe0
[62792.860255] [<ffffffff816094bf>] ret_from_fork+0x3f/0x70
[62792.871996] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
Fixes: 96db776a3682 ("i40e/i40evf: fix interrupt affinity bug")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2: enhance commit message, merge patches
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2db93d3f6d23..c0e42d162c7c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3495,7 +3495,7 @@ static void i40e_irq_affinity_notify(struct irq_affinity_notify *notify,
struct i40e_q_vector *q_vector =
container_of(notify, struct i40e_q_vector, affinity_notify);
- q_vector->affinity_mask = *mask;
+ cpumask_copy(&q_vector->affinity_mask, mask);
}
/**
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 7c213a347909..a4b60367ecce 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -520,7 +520,7 @@ static void i40evf_irq_affinity_notify(struct irq_affinity_notify *notify,
struct i40e_q_vector *q_vector =
container_of(notify, struct i40e_q_vector, affinity_notify);
- q_vector->affinity_mask = *mask;
+ cpumask_copy(&q_vector->affinity_mask, mask);
}
/**
--
2.12.3
^ permalink raw reply related
* Re: [patch net-next repost 2/3] net/sched: Change cls_flower to use IDR
From: David Miller @ 2017-08-16 17:50 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: chrism, netdev, jhs, jiri, mawilcox
In-Reply-To: <CAM_iQpXOKrT658X1ADfz=WmKryqUgzfOwUF6uTnPNG00EPOQdg@mail.gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 16 Aug 2017 10:42:01 -0700
> Also, _possibly_ we may make the tc filter handle generator
> generic?
Yes, this is a good idea.
^ permalink raw reply
* Stuck TX Using Iperf
From: Robert Jones @ 2017-08-16 17:46 UTC (permalink / raw)
To: netdev, sgoutham, rric; +Cc: Tim Harvey
Hello Cavium Ethernet Driver Maintainers,
I'm working on a custom board using a Cavium OcteonTX CN80XX cpu
running a mainline 4.12.7 kernel and I've run into a problem where the
TX of my BGX0 configured for SGMII becomes stuck.
After boot I'm able to bring the interface up, run dhclient, and ping
a host machine with no issues. However, every time that I try to run
an iperf TCP test I get a stuck TX queue. Bringing the interface down
and then up does not resolve the problem, but physically reconnecting
the cable connected to the interface does. Also, after I reconnect the
cable I am no longer able to reproduce the stuck TX queue until
reboot. I receive no kernel driver messages of any sort during my
iperf test, it just stalls until I kill it.
Any help would be appreciated.
Regards,
Robert Jones - Software Engineer
Gateworks Corporation
^ permalink raw reply
* Re: [PATCH net-next 3/3 v4] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
From: Subash Abhinov Kasiviswanathan @ 2017-08-16 17:46 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, fengguang.wu, dcbw, stephen
In-Reply-To: <20170816113017.GJ1868@nanopsycho>
>> +struct rmnet_phys_ep_conf_s {
>
> The name is cryptic. Why "_s"?
>> +
>> +struct rmnet_vnd_private_s {
>
> Again, cryptic.
>
>
>> + struct rmnet_logical_ep_conf_s local_ep;
>> + u32 msg_enable;
>> +};
Hi Jiri
The _s was to indicate struct. I'll remove that.
>> +rx_handler_result_t rmnet_ingress_handler(struct sk_buff *skb)
>> +{
>> + struct rmnet_phys_ep_conf_s *config;
>
> I still fail to understand why the name of this is "config". Please
> change to something else across whole code. Including the name of the
> struct.
>
>> + struct net_device *dev;
>> + int rc;
>> +
>> + if (!skb)
>> + return RX_HANDLER_CONSUMED;
>> +
>> + dev = skb->dev;
>> + config = rmnet_get_phys_ep_config(skb->dev);
>
> You have dev. Why not use dev?
>
>> +rx_handler_result_t rmnet_rx_handler(struct sk_buff **pskb)
>> +{
>> + return rmnet_ingress_handler(*pskb);
>
> This is just silly. Why you don't have the content of
> rmnet_ingress_handler
> right here?
I'll make these changes now.
^ permalink raw reply
* Re: [patch net-next repost 2/3] net/sched: Change cls_flower to use IDR
From: Cong Wang @ 2017-08-16 17:42 UTC (permalink / raw)
To: Chris Mi
Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
David Miller, mawilcox
In-Reply-To: <1502871244-19870-3-git-send-email-chrism@mellanox.com>
On Wed, Aug 16, 2017 at 1:14 AM, Chris Mi <chrism@mellanox.com> wrote:
> Currently, all filters with the same priority are linked in a doubly
> linked list. Every filter should have a unique handle. To make the
> handle unique, we need to iterate the list every time to see if the
> handle exists or not when inserting a new filter. It is time-consuming.
> For example, it takes about 5m3.169s to insert 64K rules.
>
> This patch changes cls_flower to use IDR. With this patch, it
> takes about 0m1.127s to insert 64K rules. The improvement is huge.
>
> But please note that in this testing, all filters share the same action.
> If every filter has a unique action, that is another bottleneck.
> Follow-up patch in this patchset addresses that.
>
> Signed-off-by: Chris Mi <chrism@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
While you are on it, why only migrate cls_flower? I am sure
we have similar handle generators in other tc flters too.
Also, _possibly_ we may make the tc filter handle generator
generic?
Thanks.
^ permalink raw reply
* Re: [PATCH net-next 3/3 v4] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
From: Subash Abhinov Kasiviswanathan @ 2017-08-16 17:38 UTC (permalink / raw)
To: David Miller, David Laight; +Cc: netdev, fengguang.wu, dcbw, jiri, stephen
In-Reply-To: <20170815.212414.71937112074428743.davem@davemloft.net>
>> > +static int rmnet_unregister_real_device(struct net_device *dev)
>> > +{
>> > + int config_id = RMNET_LOCAL_LOGICAL_ENDPOINT;
>> > + struct rmnet_logical_ep_conf_s *epconfig_l;
>> > + struct rmnet_phys_ep_conf_s *config;
>> > +
>> > + ASSERT_RTNL();
>> > +
>> > + netdev_info(dev, "Removing device %s\n", dev->name);
>> > +
>> > + if (!rmnet_is_real_dev_registered(dev))
>> > + return -EINVAL;
>> > +
>> > + for (; config_id < RMNET_MAX_LOGICAL_EP; config_id++) {
>>
>> This loop is so much harder to understand because you initialize
>> the loop index several lines above the for() statement. Just
>> initialize it here at the beginning of the for() loop and deal
>> with the fact that this will have to therefore be a multi-line
>> for() statement the best you can.
> ...
>
> One way to split the multi-line for() is to put the initialiser
> on the immediately preceding line:
> config_id = RMNET_LOCAL_LOGICAL_ENDPOINT;
> for (; config_id < RMNET_MAX_LOGICAL_EP; config_id++) {
>
> David
Hi DaveM / David
I'll move the initialization as above.
>
>> +static int rmnet_set_egress_data_format(struct net_device *dev, u32
>> edf,
>> + u16 agg_size, u16 agg_count)
>
> Use a space instead of a TAB character before the "u32 edf," argument.
>
I'll fix this.
>> +static int
>> +__rmnet_set_logical_endpoint_config(struct net_device *dev,
>> + int config_id,
>> + struct rmnet_logical_ep_conf_s *epconfig)
>> +{
>> + struct rmnet_logical_ep_conf_s *epconfig_l;
>> +
>> + ASSERT_RTNL();
>> +
>> + epconfig_l = rmnet_get_logical_ep(dev, config_id);
>> +
>> + if (!epconfig_l || epconfig_l->refcount)
>> + return -EINVAL;
>> +
>> + memcpy(epconfig_l, epconfig, sizeof(struct
>> rmnet_logical_ep_conf_s));
>> + if (config_id == RMNET_LOCAL_LOGICAL_ENDPOINT)
>> + epconfig_l->mux_id = 0;
>> + else
>> + epconfig_l->mux_id = config_id;
>> +
>> + dev_hold(epconfig_l->egress_dev);
>> + return 0;
>> +}
>
> This looks really messy.
>
> One invariant that must hold is that if I try to take down netdev
> "X", all resources holding a reference to X will be immediately
> (or quickly) dropped when that request comes in via notifiers.
>
> Will this happen properly for this egress_dev reference counting?
>
>> +static int rmnet_config_notify_cb(struct notifier_block *nb,
>> + unsigned long event, void *data)
>> +{
>> + struct net_device *dev = netdev_notifier_info_to_dev(data);
>> +
>> + if (!dev)
>> + return NOTIFY_DONE;
>> +
>> + switch (event) {
>> + case NETDEV_UNREGISTER_FINAL:
>> + case NETDEV_UNREGISTER:
>> + netdev_info(dev, "Kernel unregister\n");
>> + rmnet_force_unassociate_device(dev);
>> + break;
>
> So I guess it happens here?
Yes, all the rmnet devices are removed from the real_dev
in the notifier callback.
^ permalink raw reply
* [PATCH net] ptr_ring: use kmalloc_array()
From: Eric Dumazet @ 2017-08-16 17:36 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Michael S. Tsirkin, Jason Wang
From: Eric Dumazet <edumazet@google.com>
As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.
Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.
Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
---
include/linux/ptr_ring.h | 9 +++++----
include/linux/skb_array.h | 3 ++-
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index d8c97ec8a8e6..37b4bb2545b3 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -436,9 +436,9 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
__PTR_RING_PEEK_CALL_v; \
})
-static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
+static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
{
- return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
+ return kcalloc(size, sizeof(void *), gfp);
}
static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
@@ -582,7 +582,8 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
* In particular if you consume ring in interrupt or BH context, you must
* disable interrupts/BH when doing so.
*/
-static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
+static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
+ unsigned int nrings,
int size,
gfp_t gfp, void (*destroy)(void *))
{
@@ -590,7 +591,7 @@ static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
void ***queues;
int i;
- queues = kmalloc(nrings * sizeof *queues, gfp);
+ queues = kmalloc_array(nrings, sizeof(*queues), gfp);
if (!queues)
goto noqueues;
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 35226cd4efb0..8621ffdeecbf 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -193,7 +193,8 @@ static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
}
static inline int skb_array_resize_multiple(struct skb_array **rings,
- int nrings, int size, gfp_t gfp)
+ int nrings, unsigned int size,
+ gfp_t gfp)
{
BUILD_BUG_ON(offsetof(struct skb_array, ring));
return ptr_ring_resize_multiple((struct ptr_ring **)rings,
^ permalink raw reply related
* Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty
From: Konstantin Khlebnikov @ 2017-08-16 17:32 UTC (permalink / raw)
To: Cong Wang
Cc: Linux Kernel Network Developers, David S. Miller, Jiri Kosina,
Eric Dumazet, Jamal Hadi Salim
In-Reply-To: <CAM_iQpWmJsMWrqoA3h1eLkssaMXOc6kbsj75S9mdSokFKunumw@mail.gmail.com>
On 16.08.2017 20:22, Cong Wang wrote:
> On Tue, Aug 15, 2017 at 6:39 AM, Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru> wrote:
>> This callback is used for deactivating class in parent qdisc.
>> This is cheaper to test queue length right here.
>>
>> Also this allows to catch draining screwed backlog and prevent
>> second deactivation of already inactive parent class which will
>> crash kernel for sure. Kernel with print warning at destruction
>> of child qdisc where no packets but backlog is not zero.
>
> Good cleanup. Please explicitly mark patches like this as
> for net-next.
>
> Just one comment below.
>
>
>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>> index bd24a550e0f9..18da45c0769c 100644
>> --- a/net/sched/sch_api.c
>> +++ b/net/sched/sch_api.c
>> @@ -752,6 +752,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>> const struct Qdisc_class_ops *cops;
>> unsigned long cl;
>> u32 parentid;
>> + bool notify;
>> int drops;
>>
>> if (n == 0 && len == 0)
>> @@ -764,6 +765,13 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>>
>> if (sch->flags & TCQ_F_NOPARENT)
>> break;
>> + /* Notify parent qdisc only if child qdisc becomes empty.
>> + *
>> + * If child was empty even before update then backlog
>> + * counter is screwed and we skip notification because
>> + * parent class is already passive.
>> + */
>> + notify = !sch->q.qlen && !WARN_ON_ONCE(!n);
>
> Since 'n' never changes in this function, can we just move
> this WARN_ON_ONCE() right before the loop??
>
Here warning evaluated only if qlen is zero.
'n' could be zero not together with non-zero 'len' and zero 'qlen'.
This means queue already was empty before reduce but caller somehow reduced it even further.
^ permalink raw reply
* Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty
From: Cong Wang @ 2017-08-16 17:22 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Linux Kernel Network Developers, David S. Miller, Jiri Kosina,
Eric Dumazet, Jamal Hadi Salim
In-Reply-To: <150280439968.717891.6476652519937001709.stgit@buzz>
On Tue, Aug 15, 2017 at 6:39 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> This callback is used for deactivating class in parent qdisc.
> This is cheaper to test queue length right here.
>
> Also this allows to catch draining screwed backlog and prevent
> second deactivation of already inactive parent class which will
> crash kernel for sure. Kernel with print warning at destruction
> of child qdisc where no packets but backlog is not zero.
Good cleanup. Please explicitly mark patches like this as
for net-next.
Just one comment below.
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index bd24a550e0f9..18da45c0769c 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -752,6 +752,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
> const struct Qdisc_class_ops *cops;
> unsigned long cl;
> u32 parentid;
> + bool notify;
> int drops;
>
> if (n == 0 && len == 0)
> @@ -764,6 +765,13 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
>
> if (sch->flags & TCQ_F_NOPARENT)
> break;
> + /* Notify parent qdisc only if child qdisc becomes empty.
> + *
> + * If child was empty even before update then backlog
> + * counter is screwed and we skip notification because
> + * parent class is already passive.
> + */
> + notify = !sch->q.qlen && !WARN_ON_ONCE(!n);
Since 'n' never changes in this function, can we just move
this WARN_ON_ONCE() right before the loop??
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox