Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] ucc_geth: Add BQL support
From: David Miller @ 2018-06-20  5:39 UTC (permalink / raw)
  To: joakim.tjernlund; +Cc: leoyang.li, netdev
In-Reply-To: <20180619163036.20578-1-joakim.tjernlund@infinera.com>

From: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Date: Tue, 19 Jun 2018 18:30:36 +0200

> @@ -3242,6 +3243,8 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
>  	struct ucc_geth_private *ugeth = netdev_priv(dev);
>  	u8 __iomem *bd;		/* BD pointer */
>  	u32 bd_status;
> +	int howmany = 0;
> +	unsigned int bytes_sent = 0;

Please keep the function local variable declarations ordered from
longest to shortest line.

Thank you.

^ permalink raw reply

* Re: [PATCH v1 net] stmmac: fix DMA channel hang in half-duplex mode
From: David Miller @ 2018-06-20  5:36 UTC (permalink / raw)
  To: vbhadram; +Cc: peppe.cavallaro, alexandre.torgue, joabreu, netdev
In-Reply-To: <1529245925-764-1-git-send-email-vbhadram@nvidia.com>

From: Bhadram Varka <vbhadram@nvidia.com>
Date: Sun, 17 Jun 2018 20:02:05 +0530

> HW does not support Half-duplex mode in multi-queue
> scenario. Fix it by not advertising the Half-Duplex
> mode if multi-queue enabled.
> 
> Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] net: stmmac: socfpga: add additional ocp reset line for Stratix10
From: David Miller @ 2018-06-20  5:34 UTC (permalink / raw)
  To: dinguyen; +Cc: netdev, joabreu, alexandre.torgue, peppe.cavallaro, linux-kernel
In-Reply-To: <1529422538-8447-1-git-send-email-dinguyen@kernel.org>

From: Dinh Nguyen <dinguyen@kernel.org>
Date: Tue, 19 Jun 2018 10:35:38 -0500

> The Stratix10 platform has an additional reset line, OCP(Open Core Protocol),
> that also needs to get deasserted for the stmmac ethernet controller to work.
> Thus we need to update the Kconfig to include ARCH_STRATIX10 in order to build
> dwmac-socfpga.
> 
> Also, remove the redundant check for the reset controller pointer. The
> reset driver already checks for the pointer and returns 0 if the pointer
> is NULL.
> 
> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>

Applied.

^ permalink raw reply

* general protection fault in tls_push_sg
From: syzbot @ 2018-06-20  5:34 UTC (permalink / raw)
  To: aviadye, borisp, davejwatson, davem, linux-kernel, netdev,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    ba4dbdedd3ed Merge tag 'jfs-4.18' of git://github.com/klei..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=112e9ce4400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f390986c4f7cd566
dashboard link: https://syzkaller.appspot.com/bug?extid=54bcc120da8da091d609
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+54bcc120da8da091d609@syzkaller.appspotmail.com

netlink: 8 bytes leftover after parsing attributes in process  
`syz-executor0'.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
CPU: 1 PID: 27979 Comm: syz-executor6 Not tainted 4.18.0-rc1+ #109
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
RIP: 0010:put_page include/linux/mm.h:911 [inline]
RIP: 0010:tls_push_sg+0x2a3/0x880 net/tls/tls_main.c:142
Code: fa 4d 39 e5 75 a2 e8 bc 50 f1 fa 48 8b 85 08 ff ff ff 49 8d 7f 08 48  
b9 00 00 00 00 00 fc ff df c6 00 00 48 89 f8 48 c1 e8 03 <80> 3c 08 00 0f  
85 50 05 00 00 48 8b 85 08 ff ff ff 49 8b 5f 08 80
RSP: 0018:ffff8801c5776d90 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff868a59e4 RDI: 0000000000000008
RBP: ffff8801c5776eb0 R08: ffff88018e4fc6c0 R09: ffff8801c5776668
R10: 0000000000000003 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f2d08c17700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000001ffffcc0 CR3: 0000000188ce8000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  tls_push_record+0xaec/0x1400 net/tls/tls_sw.c:264
  tls_sw_push_pending_record+0x22/0x30 net/tls/tls_sw.c:276
  tls_handle_open_record net/tls/tls_main.c:164 [inline]
  tls_sk_proto_close+0x74c/0xae0 net/tls/tls_main.c:264
  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:459
  __sock_release+0xd7/0x260 net/socket.c:603
  sock_close+0x19/0x20 net/socket.c:1186
  __fput+0x35b/0x8b0 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x1b08/0x2750 kernel/exit.c:865
  do_group_exit+0x177/0x440 kernel/exit.c:968
  get_signal+0x88e/0x1970 kernel/signal.c:2468
  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
  exit_to_usermode_loop+0x2de/0x370 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455b29
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2d08c16ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000072bec8 RCX: 0000000000455b29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
RBP: 000000000072bec8 R08: 0000000000000033 R09: 000000000072bea0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000a3e81f R14: 00007f2d08c179c0 R15: 0000000000000000
Modules linked in:
Dumping ftrace buffer:
    (ftrace buffer empty)
---[ end trace d9dfd7279b1a9c99 ]---
RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
RIP: 0010:compound_head include/linux/page-flags.h:142 [inline]
RIP: 0010:put_page include/linux/mm.h:911 [inline]
RIP: 0010:tls_push_sg+0x2a3/0x880 net/tls/tls_main.c:142


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: David Miller @ 2018-06-20  5:31 UTC (permalink / raw)
  To: liuhangbin; +Cc: xiyou.wangcong, netdev, sbrivio, pabeni, maheshb
In-Reply-To: <20180620032254.GW8958@leo.usersys.redhat.com>

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Wed, 20 Jun 2018 11:22:54 +0800

> The only case dev_change_flags() return an err is when we change IFF_UP flag.
> Since we only set/reset IFF_NOARP, do you think we still need to check the
> return value?

It is bad to try and take shortcuts on error handling using assumptions
like that.

If dev_change_flags() is adjusted to return error codes in more
situations, nobody is going to remember to undo your "optimziation"
here.

Please check for errors, thank you.

^ permalink raw reply

* Re: [PATCH net] net: sungem: fix rx checksum support
From: David Miller @ 2018-06-20  5:30 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, mroos, malat, schwab, eric.dumazet
In-Reply-To: <20180620021850.211683-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 19 Jun 2018 19:18:50 -0700

> After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
> are friends"), sungem owners reported the infamous "eth0: hw csum failure"
> message.
> 
> CHECKSUM_COMPLETE has in fact never worked for this driver, but this
> was masked by the fact that upper stacks had to strip the FCS, and
> therefore skb->ip_summed was set back to CHECKSUM_NONE before
> my recent change.
> 
> Driver configures a number of bytes to skip when the chip computes
> the checksum, and for some reason only half of the Ethernet header
> was skipped.
> 
> Then a second problem is that we should strip the FCS by default,
> unless the driver is updated to eventually support NETIF_F_RXFCS in
> the future.
> 
> Finally, a driver should check if NETIF_F_RXCSUM feature is enabled
> or not, so that the admin can turn off rx checksum if wanted.
> 
> Many thanks to Andreas Schwab and Mathieu Malaterre for their
> help in debugging this issue.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Meelis Roos <mroos@linux.ee>
> Reported-by: Mathieu Malaterre <malat@debian.org>
> Reported-by: Andreas Schwab <schwab@linux-m68k.org>
> Tested-by: Andreas Schwab <schwab@linux-m68k.org>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [PATCH net v3 2/2] ipv4: igmp: use alarmtimer to prevent delayed reports
From: kbuild test robot @ 2018-06-20  5:09 UTC (permalink / raw)
  To: Tejaswi Tanikella; +Cc: kbuild-all, netdev, f.fainelli, andrew, davem
In-Reply-To: <20180611134619.GA28666@tejaswit-linux.qualcomm.com>

[-- Attachment #1: Type: text/plain, Size: 2779 bytes --]

Hi Tejaswi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Tejaswi-Tanikella/ktime-helpers-to-convert-between-ktime-and-jiffies/20180611-214916
config: x86_64-randconfig-s4-06200944 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/linux/timer.h:6:0,
                    from include/linux/workqueue.h:9,
                    from include/linux/srcu.h:34,
                    from include/linux/notifier.h:16,
                    from include/linux/memory_hotplug.h:7,
                    from include/linux/mmzone.h:777,
                    from include/linux/gfp.h:6,
                    from include/linux/umh.h:4,
                    from include/linux/kmod.h:22,
                    from include/linux/module.h:13,
                    from net/ipv4/igmp.c:73:
   net/ipv4/igmp.c: In function 'igmp_mc_seq_show':
>> net/ipv4/igmp.c:2819:28: error: implicit declaration of function 'alarm_expires_remaining'; did you mean 'hrtimer_expires_remaining'? [-Werror=implicit-function-declaration]
      delta = ktime_to_jiffies(alarm_expires_remaining(&im->alarm));
                               ^
   include/linux/ktime.h:100:48: note: in definition of macro 'ktime_to_jiffies'
    #define ktime_to_jiffies(kt)  nsecs_to_jiffies(kt)
                                                   ^~
>> net/ipv4/igmp.c:2819:55: error: 'struct ip_mc_list' has no member named 'alarm'
      delta = ktime_to_jiffies(alarm_expires_remaining(&im->alarm));
                                                          ^
   include/linux/ktime.h:100:48: note: in definition of macro 'ktime_to_jiffies'
    #define ktime_to_jiffies(kt)  nsecs_to_jiffies(kt)
                                                   ^~
   cc1: some warnings being treated as errors

vim +2819 net/ipv4/igmp.c

  2813	
  2814			if (rcu_access_pointer(state->in_dev->mc_list) == im) {
  2815				seq_printf(seq, "%d\t%-10s: %5d %7s\n",
  2816					   state->dev->ifindex, state->dev->name, state->in_dev->mc_count, querier);
  2817			}
  2818	
> 2819			delta = ktime_to_jiffies(alarm_expires_remaining(&im->alarm));
  2820			seq_printf(seq,
  2821				   "\t\t\t\t%08X %5d %d:%08lX\t\t%d\n",
  2822				   im->multiaddr, im->users,
  2823				   im->tm_running,
  2824				   im->tm_running ? jiffies_delta_to_clock_t(delta) : 0,
  2825				   im->reporter);
  2826		}
  2827		return 0;
  2828	}
  2829	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25732 bytes --]

^ permalink raw reply

* Re: [PATCH v2] net: dsa: drop some VLAs in switch.c
From: Kees Cook @ 2018-06-20  4:43 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: Andrew Lunn, LKML, Kernel Hardening, Network Development,
	David S. Miller, Florian Fainelli, Vivien Didelot, David Laight
In-Reply-To: <1525706596-13601-1-git-send-email-s.mesoraca16@gmail.com>

On Mon, May 7, 2018 at 8:23 AM, Salvatore Mesoraca
<s.mesoraca16@gmail.com> wrote:
> We avoid 2 VLAs by using a pre-allocated field in dsa_switch.
> We also try to avoid dynamic allocation whenever possible.
>
> Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
> Link: http://lkml.kernel.org/r/20180505185145.GB32630@lunn.ch
>
> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

Friendly ping. What's needed to take this into the tree? It looks like
all the issues in v1 were addressed here.

Thanks!

-Kees

> ---
>  include/net/dsa.h |  3 +++
>  net/dsa/dsa2.c    | 14 ++++++++++++++
>  net/dsa/switch.c  | 22 ++++++++++------------
>  3 files changed, 27 insertions(+), 12 deletions(-)
>
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 60fb4ec..576791d 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -256,6 +256,9 @@ struct dsa_switch {
>         /* Number of switch port queues */
>         unsigned int            num_tx_queues;
>
> +       unsigned long           *bitmap;
> +       unsigned long           _bitmap;
> +
>         /* Dynamically allocated ports, keep last */
>         size_t num_ports;
>         struct dsa_port ports[];
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index adf50fb..cebf35f0 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -748,6 +748,20 @@ struct dsa_switch *dsa_switch_alloc(struct device *dev, size_t n)
>         if (!ds)
>                 return NULL;
>
> +       /* We avoid allocating memory outside dsa_switch
> +        * if it is not needed.
> +        */
> +       if (n <= sizeof(ds->_bitmap) * 8) {
> +               ds->bitmap = &ds->_bitmap;
> +       } else {
> +               ds->bitmap = devm_kzalloc(dev,
> +                                         BITS_TO_LONGS(n) *
> +                                               sizeof(unsigned long),
> +                                         GFP_KERNEL);
> +               if (unlikely(!ds->bitmap))
> +                       return NULL;
> +       }
> +
>         ds->dev = dev;
>         ds->num_ports = n;
>
> diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> index b935117..142b294 100644
> --- a/net/dsa/switch.c
> +++ b/net/dsa/switch.c
> @@ -136,21 +136,20 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds,
>  {
>         const struct switchdev_obj_port_mdb *mdb = info->mdb;
>         struct switchdev_trans *trans = info->trans;
> -       DECLARE_BITMAP(group, ds->num_ports);
>         int port;
>
>         /* Build a mask of Multicast group members */
> -       bitmap_zero(group, ds->num_ports);
> +       bitmap_zero(ds->bitmap, ds->num_ports);
>         if (ds->index == info->sw_index)
> -               set_bit(info->port, group);
> +               set_bit(info->port, ds->bitmap);
>         for (port = 0; port < ds->num_ports; port++)
>                 if (dsa_is_dsa_port(ds, port))
> -                       set_bit(port, group);
> +                       set_bit(port, ds->bitmap);
>
>         if (switchdev_trans_ph_prepare(trans))
> -               return dsa_switch_mdb_prepare_bitmap(ds, mdb, group);
> +               return dsa_switch_mdb_prepare_bitmap(ds, mdb, ds->bitmap);
>
> -       dsa_switch_mdb_add_bitmap(ds, mdb, group);
> +       dsa_switch_mdb_add_bitmap(ds, mdb, ds->bitmap);
>
>         return 0;
>  }
> @@ -204,21 +203,20 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
>  {
>         const struct switchdev_obj_port_vlan *vlan = info->vlan;
>         struct switchdev_trans *trans = info->trans;
> -       DECLARE_BITMAP(members, ds->num_ports);
>         int port;
>
>         /* Build a mask of VLAN members */
> -       bitmap_zero(members, ds->num_ports);
> +       bitmap_zero(ds->bitmap, ds->num_ports);
>         if (ds->index == info->sw_index)
> -               set_bit(info->port, members);
> +               set_bit(info->port, ds->bitmap);
>         for (port = 0; port < ds->num_ports; port++)
>                 if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> -                       set_bit(port, members);
> +                       set_bit(port, ds->bitmap);
>
>         if (switchdev_trans_ph_prepare(trans))
> -               return dsa_switch_vlan_prepare_bitmap(ds, vlan, members);
> +               return dsa_switch_vlan_prepare_bitmap(ds, vlan, ds->bitmap);
>
> -       dsa_switch_vlan_add_bitmap(ds, vlan, members);
> +       dsa_switch_vlan_add_bitmap(ds, vlan, ds->bitmap);
>
>         return 0;
>  }
> --
> 1.9.1
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply

* [PATCH net-next] tcp: ignore rcv_rtt sample with old ts ecr value
From: Wei Wang @ 2018-06-20  4:42 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Eric Dumazet, Neal Cardwell, Wei Wang

From: Wei Wang <weiwan@google.com>

When receiving multiple packets with the same ts ecr value, only try
to compute rcv_rtt sample with the earliest received packet.
This is because the rcv_rtt calculated by later received packets
could possibly include long idle time or other types of delay.
For example:
(1) server sends last packet of reply with TS val V1
(2) client ACKs last packet of reply with TS ecr V1
(3) long idle time passes
(4) client sends next request data packet with TS ecr V1 (again!)
At this time, the rcv_rtt computed on server with TS ecr V1 will be
inflated with the idle time and should get ignored.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/tcp.h  |  1 +
 net/ipv4/tcp.c       |  1 +
 net/ipv4/tcp_input.c | 14 +++++++++++---
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 72705eaf4b84..3dbea6610304 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -350,6 +350,7 @@ struct tcp_sock {
 #endif
 
 /* Receiver side RTT estimation */
+	u32 rcv_rtt_last_tsecr;
 	struct {
 		u32	rtt_us;
 		u32	seq;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 141acd92e58a..47c45d5be9f9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2563,6 +2563,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	sk->sk_shutdown = 0;
 	sock_reset_flag(sk, SOCK_DONE);
 	tp->srtt_us = 0;
+	tp->rcv_rtt_last_tsecr = 0;
 	tp->write_seq += tp->max_window + 2;
 	if (tp->write_seq == 0)
 		tp->write_seq = 1;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 355d3dffd021..76ca88f63b70 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -582,9 +582,12 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	if (tp->rx_opt.rcv_tsecr &&
-	    (TCP_SKB_CB(skb)->end_seq -
-	     TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss)) {
+	if (tp->rx_opt.rcv_tsecr == tp->rcv_rtt_last_tsecr)
+		return;
+	tp->rcv_rtt_last_tsecr = tp->rx_opt.rcv_tsecr;
+
+	if (TCP_SKB_CB(skb)->end_seq -
+	    TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss) {
 		u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
 		u32 delta_us;
 
@@ -5475,6 +5478,11 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb)
 				tcp_ack(sk, skb, 0);
 				__kfree_skb(skb);
 				tcp_data_snd_check(sk);
+				/* When receiving pure ack in fast path, update
+				 * last ts ecr directly instead of calling
+				 * tcp_rcv_rtt_measure_ts()
+				 */
+				tp->rcv_rtt_last_tsecr = tp->rx_opt.rcv_tsecr;
 				return;
 			} else { /* Header too small */
 				TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Hangbin Liu @ 2018-06-20  3:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Stefano Brivio, Paolo Abeni,
	David Miller, Mahesh Bandewar
In-Reply-To: <CAM_iQpXeuU=Cxons5=centGNJzm0OaNU3Jj5hE91hvJH0o2-Eg@mail.gmail.com>

On Tue, Jun 19, 2018 at 02:10:18PM -0700, Cong Wang wrote:
> On Mon, Jun 18, 2018 at 7:04 AM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> > @@ -94,10 +95,13 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
> >                         mdev->l3mdev_ops = NULL;
> >                 }
> >                 list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
> > +                       flags = ipvlan->dev->flags;
> >                         if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S)
> > -                               ipvlan->dev->flags |= IFF_NOARP;
> > +                               dev_change_flags(ipvlan->dev,
> > +                                                flags | IFF_NOARP);
> >                         else
> > -                               ipvlan->dev->flags &= ~IFF_NOARP;
> > +                               dev_change_flags(ipvlan->dev,
> > +                                                flags & ~IFF_NOARP);
> 
> You need to check the return value of dev_change_flags().

Hi Wang Cong,

The only case dev_change_flags() return an err is when we change IFF_UP flag.
Since we only set/reset IFF_NOARP, do you think we still need to check the
return value?

Thanks
Hangbin

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Jakub Kicinski @ 2018-06-20  2:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers, Christoph Hellwig
In-Reply-To: <20180619213715.GC33049@bhelgaas-glaptop.roam.corp.google.com>

On Tue, 19 Jun 2018 16:37:15 -0500, Bjorn Helgaas wrote:
> On Fri, May 25, 2018 at 09:02:23AM -0500, Bjorn Helgaas wrote:
> > On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:  
> > > Hi Bjorn!
> > > 
> > > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:  
> > > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:  
> > > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > > to not fail, e.g.:
> > > > > 
> > > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > > 
> > > > > For devices which VF support depends on loaded FW we have the
> > > > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > > to 0.  Remove the special values completely and simply initialize
> > > > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > > > Add a helper for drivers to reset the VF limit back to total.    
> > > > 
> > > > I still can't really make sense out of the changelog.
> > > >
> > > > I think part of the reason it's confusing is because there are two
> > > > things going on:
> > > > 
> > > >   1) You want this:
> > > >   
> > > >        pci_sriov_set_totalvfs(dev, 0);
> > > >        x = pci_sriov_get_totalvfs(dev) 
> > > > 
> > > >      to return 0 instead of total_VFs.  That seems to connect with
> > > >      your subject line.  It means "sriov_totalvfs" in sysfs could be
> > > >      0, but I don't know how that is useful (I'm sure it is; just
> > > >      educate me :))  
> > > 
> > > Let me just quote the bug report that got filed on our internal bug
> > > tracker :)
> > > 
> > >   When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> > >   errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> > >   then tries to set that as the sriov_numvfs parameter.
> > > 
> > >   For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, 
> > >   but it's set to max.  When FW is switched to flower*, the correct 
> > >   sriov_totalvfs value is presented.
> > > 
> > > * flower is a project name  
> > 
> > From the point of view of the PCI core (which knows nothing about
> > device firmware and relies on the architected config space described
> > by the PCIe spec), this sounds like an erratum: with some firmware
> > installed, the device is not capable of SR-IOV, but still advertises
> > an SR-IOV capability with "TotalVFs > 0".
> > 
> > Regardless of whether that's an erratum, we do allow PF drivers to use
> > pci_sriov_set_totalvfs() to limit the number of VFs that may be
> > enabled by writing to the PF's "sriov_numvfs" sysfs file.
> > 
> > But the current implementation does not allow a PF driver to limit VFs
> > to 0, and that does seem nonsensical.
> >   
> > > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > > VFs can be enabled, looks like this is the code:
> > > 
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> > >   
> > > >   2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> > > >      sure what you intend for this.  Is *every* driver supposed to
> > > >      call it in .remove()?  Could/should this be done in the core
> > > >      somehow instead of depending on every driver?  
> > > 
> > > Good question, I was just thinking yesterday we may want to call it
> > > from the core, but I don't think it's strictly necessary nor always
> > > sufficient (we may reload FW without re-probing).
> > > 
> > > We have a device which supports different number of VFs based on the FW
> > > loaded.  Some legacy FWs does not inform the driver how many VFs it can
> > > support, because it supports max.  So the flow in our driver is this:
> > > 
> > > load_fw(dev);
> > > ...
> > > max_vfs = ask_fw_for_max_vfs(dev);
> > > if (max_vfs >= 0)
> > > 	return pci_sriov_set_totalvfs(dev, max_vfs);
> > > else /* FW didn't tell us, assume max */
> > > 	return pci_sriov_reset_totalvfs(dev); 
> > > 
> > > We also reset the max on device remove, but that's not strictly
> > > necessary.
> > > 
> > > Other users of pci_sriov_set_totalvfs() always know the value to set
> > > the total to (either always get it from FW or it's a constant).
> > > 
> > > If you prefer we can work out the correct max for those legacy cases in
> > > the driver as well, although it seemed cleaner to just ask the core,
> > > since it already has total_VFs value handy :)
> > >   
> > > > I'm also having a hard time connecting your user-space command example
> > > > with the rest of this.  Maybe it will make more sense to me tomorrow
> > > > after some coffee.  
> > > 
> > > OpenStack assumes it will always be able to set sriov_numvfs to
> > > sriov_totalvfs, see this 'if':
> > > 
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512  
> > 
> > Thanks for educating me.  I think there are two issues here that we
> > can separate.  I extracted the patch below for the first.
> > 
> > The second is the question of resetting driver_max_VFs.  I think we
> > currently have a general issue in the core:
> > 
> >   - load PF driver 1
> >   - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
> >   - unload PF driver 1
> >   - load PF driver 2
> > 
> > Now driver_max_VFs is still stuck at the lower value set by driver 1.
> > I don't think that's the way this should work.
> > 
> > I guess this is partly a consequence of setting driver_max_VFs in
> > sriov_init(), which is called before driver attach and should only
> > depend on hardware characteristics, so it is related to the patch
> > below.  But I think we should fix it in general, not just for
> > netronome.  
> 
> Hi Jakub, the patch below is in v4.18-rc1 as 8d85a7a4f2c9 ("PCI/IOV:
> Allow PF drivers to limit total_VFs to 0").  If there's more we need
> to do here, would you mind rebasing what's left to v4.18-rc1 and
> reposting it?

Hi Bjorn!

Thanks a lot for looking into this!  My understanding is that we have
two ways forward:
 - add a pci_sriov_reset_totalvfs() helper for drivers to call;
 - make the core reset the totalVFs after driver is detached.

IMHO second option is better.  I went ahead and posted:

https://patchwork.ozlabs.org/patch/931210/

This works very well for nfp driver (modulo minor clean ups but I'd
rather route those via networking trees to avoid conflicts).

> > commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> > Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Date:   Fri May 25 08:18:34 2018 -0500
> > 
> >     PCI/IOV: Allow PF drivers to limit total_VFs to 0
> >     
> >     Some SR-IOV PF drivers implement .sriov_configure(), which allows
> >     user-space to enable VFs by writing the desired number of VFs to the sysfs
> >     "sriov_numvfs" file (see sriov_numvfs_store()).
> >     
> >     The PCI core limits the number of VFs to the TotalVFs advertised by the
> >     device in its SR-IOV capability.  The PF driver can limit the number of VFs
> >     to even fewer (it may have pre-allocated data structures or knowledge of
> >     device limitations) by calling pci_sriov_set_totalvfs(), but previously it
> >     could not limit the VFs to 0.
> >     
> >     Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
> >     by the PF driver, even if the limit is 0.
> >     
> >     This sequence:
> >     
> >       pci_sriov_set_totalvfs(dev, 0);
> >       x = pci_sriov_get_totalvfs(dev);
> >     
> >     previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
> >     "x" to 0.
> >     
> >     Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index 192b82898a38..d0d73dbbd5ca 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >  	iov->nres = nres;
> >  	iov->ctrl = ctrl;
> >  	iov->total_VFs = total;
> > +	iov->driver_max_VFs = total;
> >  	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
> >  	iov->pgsz = pgsz;
> >  	iov->self = dev;
> > @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
> >  	if (!dev->is_physfn)
> >  		return 0;
> >  
> > -	if (dev->sriov->driver_max_VFs)
> > -		return dev->sriov->driver_max_VFs;
> > -
> > -	return dev->sriov->total_VFs;
> > +	return dev->sriov->driver_max_VFs;
> >  }
> >  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
> >    

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: stmmac: dwmac-rk: Add GMAC support for PX30
From: David Wu @ 2018-06-20  2:40 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: davem, robh+dt, mark.rutland, huangtao, netdev, linux-arm-kernel,
	linux-rockchip, linux-kernel, 张晴
In-Reply-To: <2582999.2hZx6CH9S6@diego>

Hi Heiko,

在 2018年06月14日 16:30, Heiko Stübner 写道:
> Am Donnerstag, 14. Juni 2018, 10:14:31 CEST schrieb David Wu:
>> Hi Heiko,
>>
>> 在 2018年06月14日 15:54, Heiko Stübner 写道:
>>> I don't see that new clock documented in the dt-binding.
>>> Also, which clock from the clock-controller does this connect to?
>>
>> The clock is the "SCLK_GMAC_RMII" at the clock-controller, which could
>> be set rate by the link speed.
> 
> Hmm, while these huge number of clocks are somewhat strange,
> shouldn't it be named something with _rmii instead of _speed then?

Okay, it is better to be named _speed.

> 
> Also, I don't see any clk_enable action for that new clock, so you could
> end up with being off?

The new speed is the parent of the clk_tx_rx, to enable/disable 
clk_tx_rx, the new clock would be also enabled/disabled.

> 
> And someone could convert the driver to use the new clk-bulk APIs [0],
> so the large number of clk_prepare_enable calls would be a bit
> trimmed down.
> 
> 
> Heiko
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/clk-bulk.c
> 
> 
> 
> 
> 

^ permalink raw reply

* [PATCH net] net: sungem: fix rx checksum support
From: Eric Dumazet @ 2018-06-20  2:18 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Meelis Roos, Mathieu Malaterre, Andreas Schwab,
	Eric Dumazet, Eric Dumazet

After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
are friends"), sungem owners reported the infamous "eth0: hw csum failure"
message.

CHECKSUM_COMPLETE has in fact never worked for this driver, but this
was masked by the fact that upper stacks had to strip the FCS, and
therefore skb->ip_summed was set back to CHECKSUM_NONE before
my recent change.

Driver configures a number of bytes to skip when the chip computes
the checksum, and for some reason only half of the Ethernet header
was skipped.

Then a second problem is that we should strip the FCS by default,
unless the driver is updated to eventually support NETIF_F_RXFCS in
the future.

Finally, a driver should check if NETIF_F_RXCSUM feature is enabled
or not, so that the admin can turn off rx checksum if wanted.

Many thanks to Andreas Schwab and Mathieu Malaterre for their
help in debugging this issue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
---
 drivers/net/ethernet/sun/sungem.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..b9221fc1674dfa0ef17a43f8ff86d700a1ae514f 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -60,8 +60,7 @@
 #include <linux/sungem_phy.h>
 #include "sungem.h"
 
-/* Stripping FCS is causing problems, disabled for now */
-#undef STRIP_FCS
+#define STRIP_FCS
 
 #define DEFAULT_MSG	(NETIF_MSG_DRV		| \
 			 NETIF_MSG_PROBE	| \
@@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
 	writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
 	writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
 	val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-	       ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+	       (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
 	writel(val, gp->regs + RXDMA_CFG);
 	if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
 		writel(((5 & RXDMA_BLANK_IPKTS) |
@@ -760,7 +759,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
 	struct net_device *dev = gp->dev;
 	int entry, drops, work_done = 0;
 	u32 done;
-	__sum16 csum;
 
 	if (netif_msg_rx_status(gp))
 		printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n",
@@ -855,9 +853,13 @@ static int gem_rx(struct gem *gp, int work_to_do)
 			skb = copy_skb;
 		}
 
-		csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
-		skb->csum = csum_unfold(csum);
-		skb->ip_summed = CHECKSUM_COMPLETE;
+		if (likely(dev->features & NETIF_F_RXCSUM)) {
+			__sum16 csum;
+
+			csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
+			skb->csum = csum_unfold(csum);
+			skb->ip_summed = CHECKSUM_COMPLETE;
+		}
 		skb->protocol = eth_type_trans(skb, gp->dev);
 
 		napi_gro_receive(&gp->napi, skb);
@@ -1761,7 +1763,7 @@ static void gem_init_dma(struct gem *gp)
 	writel(0, gp->regs + TXDMA_KICK);
 
 	val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-	       ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+	       (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
 	writel(val, gp->regs + RXDMA_CFG);
 
 	writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);
@@ -2985,8 +2987,8 @@ static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_set_drvdata(pdev, dev);
 
 	/* We can do scatter/gather and HW checksum */
-	dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM;
-	dev->features |= dev->hw_features | NETIF_F_RXCSUM;
+	dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
+	dev->features = dev->hw_features;
 	if (pci_using_dac)
 		dev->features |= NETIF_F_HIGHDMA;
 
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH] selftests: net: add config fragments
From: Shannon Nelson @ 2018-06-20  1:42 UTC (permalink / raw)
  To: Anders Roxell, davem, shuah, fw; +Cc: netdev, linux-kselftest, linux-kernel
In-Reply-To: <20180619164111.30785-1-anders.roxell@linaro.org>

On 6/19/2018 9:41 AM, Anders Roxell wrote:
> Add fragments to pass bridge and vlan tests.
> 
> Fixes: 33b01b7b4f19 ("selftests: add rtnetlink test script")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
> 
> Hi,
> 
> net/rtnetlink.sh still fails on tc hbt hierarchy, addrlabel and ipsec:
> Error: Specified qdisc not found.
> RTNETLINK answers: No such file or directory
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Invalid handle.
> FAIL: tc htb hierarchy
> 
> FAIL: ipv6 addrlabel
> 
> FAIL: can't add fou port 7777, skipping test
> RTNETLINK answers: Operation not supported
> FAIL: can't add macsec interface, skipping test
> RTNETLINK answers: Protocol not supported
> RTNETLINK answers: No such process
> RTNETLINK answers: No such process
> ./rtnetlink.sh: line 527:  5356 Terminated              ip x m >
> $tmpfile
> FAIL: ipsec
> 
> 
> I'm using iproute2 tag: 4.17 and tried the qdisc command from the
> function kci_test_tc in net/rtnetlink.sh:
> $ tc qdisc add dev lo root handle 1: htb
> Error: Specified qdisc not found.
> 
> For kci_test_addrlabel it fails on this row:
> ip addrlabel list |grep -q "prefix dead::/64 dev lo label 1"
> 
> Any idea why these three fails?


The "Terminated" line is there because "ip x m" had been put into the 
background, and at the end of the ipsec test it is killed.  I can try to 
play some games with exec and redirection to make that go away.

The "FAIL: ipsec" is partly because the test isn't smart enough to look 
to see if there is any offload actually available to test.  I'm working 
on a patch to netdevsim to add the ipsec-offload in order to have a 
better test.  And yes, this should say "ipsec-offload", not "ipsec".

I don't know about the qdisk or addrlabel issues.

Cheers,
sln


> 
> Cheers,
> Anders
> 
>   tools/testing/selftests/net/config | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
> index 7ba089b33e8b..cd3a2f1545b5 100644
> --- a/tools/testing/selftests/net/config
> +++ b/tools/testing/selftests/net/config
> @@ -12,3 +12,5 @@ CONFIG_NET_IPVTI=y
>   CONFIG_INET6_XFRM_MODE_TUNNEL=y
>   CONFIG_IPV6_VTI=y
>   CONFIG_DUMMY=y
> +CONFIG_BRIDGE=y
> +CONFIG_VLAN_8021Q=y
> 

^ permalink raw reply

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Andrew Collins @ 2018-06-20  0:56 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, pablo, netfilter-devel, netdev, Steffen Klassert
In-Reply-To: <20180614.165549.1790121217625223670.davem@davemloft.net>

On Thu, Jun 14, 2018 at 5:55 PM, David Miller <davem@davemloft.net> wrote:
> And guess what?  Then millions of possibilities would have been
> openned up, rather than just this one special case.
>
> So, I ask, please see the larger picture.

+cc netdev/etc

This is perhaps unrelated to the topic at hand, but as someone who's shipped
a bunch of devices over the years using the linux kernel forwarding path and
needs performance but wants to avoid moving to out of tree userspace offload
for all the reasons that you and many others have stated, is the long
term vision that the existing kernel forwarding path will transparently take
advantage of eBPF (ala bpfilter), or that users will write custom/individualized
eBPF forwarding paths for their usecases as necessary?

I (and I suspect many others) will start on the latter anyways, I'm just curious
whether it's desired/expected that such custom fastpath users will eventually
be rolled back into/replaced by a transparent upstream in-kernel equivalent.

^ permalink raw reply

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Eric Dumazet @ 2018-06-20  0:25 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet@gmail.com, kafai@fb.com,
	Tariq Toukan
  Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <deda277de538f2083fb23479f26afa03ffc79644.camel@mellanox.com>

On 06/19/2018 11:05 AM, Saeed Mahameed wrote:

> this is only true for XDP setup, for non XDP max stride_size can only
> be around ~3k and only for mtu > ~6k
> 
> For XDP setup you suggested:
> -               priv->frag_info[0].frag_size = eff_mtu;
> +               priv->frag_info[0].frag_size = PAGE_SIZE;
> 
> currently the condition is:
> 
> release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> 
> so my solution and yours have the same problem you described above.
> 
> the problem is not with the initial values or with stride/farg size
> math, it just that in XDP we shouldn't reuse at ALL. I agree with you
> that we need to optimize and maybe for PAGE_SIZE > 8k we need to allow
> XDP setup to reuses. but for now there is a data corruption to handle.

Sure, we all agree there is a bug to fix.

The way you are fixing it is kind of illogical.

The NIC can use a frag if its _size_ is big enough to receive the frame.

The _stride_  is an abstraction created by the driver to report an estimation of the _truesize_,
or memory consumption, so that linux can better track overall memory usage.

For example, if MTU=1500, the size of the fragment is 1536 bytes, but since we can put only
2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048 bytes.

Declaring that a final blob of a page, being 1600 bytes, not able to receive a frame because
_stride_ is 2048 is illogical and waste resources.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-20  0:19 UTC (permalink / raw)
  To: Michael Ellerman, Eric Dumazet, Mathieu Malaterre,
	David S. Miller
  Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <87muvqwc06.fsf@concordia.ellerman.id.au>

On 06/19/2018 05:13 PM, Michael Ellerman wrote:

> Just so I'm clear, this turned out to be a driver/hw problem rather than
> the arch csum implementation?

Yes, that was a driver bug.

I will send an official patch to fix this.

You guys will have faster RX, since CHECKSUM_COMPLETE will finally be used up to TCP/UDP stacks.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Michael Ellerman @ 2018-06-20  0:13 UTC (permalink / raw)
  To: Eric Dumazet, Mathieu Malaterre, David S. Miller
  Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <fbb95c11-240c-1a11-0a62-0483908c577e@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 06/15/2018 11:56 AM, Mathieu Malaterre wrote:
>> This reverts commit 88078d98d1bb085d72af8437707279e203524fa5.
>> 
>> It causes regressions for people using chips driven by the sungem
>> driver. Suspicion is that the skb->csum value isn't being adjusted
>> properly.
>> 
>> Symptoms as seen on G4+sungem are:
>> 
>> [   34.023281] eth0: hw csum failure
>> [   34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2
>> [   34.023618] Call Trace:
>> [   34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 (unreliable)
>> [   34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00
>> [   34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4
>> [   34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154
>> [   34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774
>> [   34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184
>> [   34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c
>> [   34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem]
>> [   34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618
>> [   34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0
>> [   34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8
>> [   34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c
>> [   34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0
>> [   34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14
>> [   34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
>>                    LR = arch_cpu_idle+0x30/0x78
>> [   34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable)
>> [   34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158
>> [   34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28
>> [   34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490
>> [   34.027181] [c0cf7ff0] [00003444] 0x3444
>> 
>> See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more
>> adequately in pskb_trim_rcsum()."") for previous reference.
>
> This fix seems to hide a bug in csum functions on this architecture.
>
> Or a bug on this NIC when receiving a small packet (less than 60 bytes).
> Maybe the padding bytes are not included in NIC provided csum, and not 0.

Just so I'm clear, this turned out to be a driver/hw problem rather than
the arch csum implementation?

cheers

^ permalink raw reply

* Re: [PATCH] bpfilter: ignore binary files
From: David Miller @ 2018-06-20  0:08 UTC (permalink / raw)
  To: mcroce; +Cc: netdev, ast
In-Reply-To: <20180619152136.2340-1-mcroce@redhat.com>

From: Matteo Croce <mcroce@redhat.com>
Date: Tue, 19 Jun 2018 17:21:36 +0200

> net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
> enabled, add it to .gitignore to avoid committing it.
> 
> Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bpfilter: fix build error
From: David Miller @ 2018-06-20  0:08 UTC (permalink / raw)
  To: mcroce; +Cc: netdev, ast
In-Reply-To: <20180619151620.1912-1-mcroce@redhat.com>

From: Matteo Croce <mcroce@redhat.com>
Date: Tue, 19 Jun 2018 17:16:20 +0200

> bpfilter Makefile assumes that the system locale is en_US, and the
> parsing of objdump output fails.
> Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
> only 2 processes instead of 7.
> 
> Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/usb/drivers: Remove useless hrtimer_active check
From: David Miller @ 2018-06-20  0:07 UTC (permalink / raw)
  To: daniel.lezcano; +Cc: oliver, linux-usb, netdev, linux-kernel
In-Reply-To: <1529417671-21384-1-git-send-email-daniel.lezcano@linaro.org>

From: Daniel Lezcano <daniel.lezcano@linaro.org>
Date: Tue, 19 Jun 2018 16:14:30 +0200

> The code does:
> 
>  if (hrtimer_active(&t))
>     hrtimer_cancel(&t);
> 
> However, hrtimer_cancel() checks if the timer is active, so the
> test above is pointless.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

Applied.

^ permalink raw reply

* Re: [PATCH net] net/sched: act_ife: preserve the action control in case of error
From: David Miller @ 2018-06-20  0:06 UTC (permalink / raw)
  To: dcaratti; +Cc: jhs, xiyou.wangcong, netdev
In-Reply-To: <717d0682de4fb24ed6818f2bf264202d22e1e8be.1529415179.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Tue, 19 Jun 2018 15:45:50 +0200

> in the following script
> 
>  # tc actions add action ife encode allow prio pass index 42
>  # tc actions replace action ife encode allow tcindex drop index 42
> 
> the action control should remain equal to 'pass', if the kernel failed
> to replace the TC action. Pospone the assignment of the action control,
> to ensure it is not overwritten in the error path of tcf_ife_init().
> 
> Fixes: ef6980b6becb ("introduce IFE action")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] net/sched: act_ife: fix recursive lock and idr leak
From: David Miller @ 2018-06-20  0:06 UTC (permalink / raw)
  To: dcaratti; +Cc: jhs, xiyou.wangcong, netdev
In-Reply-To: <40b45f70ef007b222b36a4676174e597c41d697f.1529415169.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Tue, 19 Jun 2018 15:39:46 +0200

> a recursive lock warning [1] can be observed with the following script,
 ...
> in case the kernel was unable to run the last command (e.g. because of
> the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
> the kernel can leak idr in the error path of tcf_ife_init(), because
> tcf_idr_release() is not called after successful idr reservation:
 ...
> Since tcfa_lock is already taken when the action is being edited, a call
> to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
> again. On the other hand, tcf_idr_release() needs to be called in the
> error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
> Fix both problems in tcf_ife_init().
> Since the cleanup() routine can now be called when ife->params is NULL,
> also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).
> 
>  [1]
 ...
> Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding")
> Fixes: ef6980b6becb ("introduce IFE action")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* [PATCH] 6lowpan: iphc: reset mac_header after decompress to fix panic
From: Michael Scott @ 2018-06-19 23:44 UTC (permalink / raw)
  Cc: Michael Scott, Alexander Aring, Jukka Rissanen, David S. Miller,
	linux-bluetooth, linux-wpan, netdev, linux-kernel

After decompression of 6lowpan socket data, an IPv6 header is inserted
before the existing socket payload.  After this, we reset the
network_header value of the skb to account for the difference in payload
size from prior to decompression + the addition of the IPv6 header.

However, we fail to reset the mac_header value.

Leaving the mac_header value untouched here, can cause a calculation
error in net/packet/af_packet.c packet_rcv() function when an
AF_PACKET socket is opened in SOCK_RAW mode for use on a 6lowpan
interface.

On line 2088, the data pointer is moved backward by the value returned
from skb_mac_header().  If skb->data is adjusted so that it is before
the skb->head pointer (which can happen when an old value of mac_header
is left in place) the kernel generates a panic in net/core/skbuff.c
line 1717.

This panic can be generated by BLE 6lowpan interfaces (such as bt0) and
802.15.4 interfaces (such as lowpan0) as they both use the same 6lowpan
sources for compression and decompression.

Signed-off-by: Michael Scott <michael@opensourcefoundries.com>
---
 net/6lowpan/iphc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/6lowpan/iphc.c b/net/6lowpan/iphc.c
index 6b1042e21656..52fad5dad9f7 100644
--- a/net/6lowpan/iphc.c
+++ b/net/6lowpan/iphc.c
@@ -770,6 +770,7 @@ int lowpan_header_decompress(struct sk_buff *skb, const struct net_device *dev,
 		hdr.hop_limit, &hdr.daddr);

 	skb_push(skb, sizeof(hdr));
+	skb_reset_mac_header(skb);
 	skb_reset_network_header(skb);
 	skb_copy_to_linear_data(skb, &hdr, sizeof(hdr));

-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] net: propagate dev_get_valid_name return code
From: David Miller @ 2018-06-19 23:38 UTC (permalink / raw)
  To: lirongqing; +Cc: netdev
In-Reply-To: <1529400197-24165-1-git-send-email-lirongqing@baidu.com>

From: Li RongQing <lirongqing@baidu.com>
Date: Tue, 19 Jun 2018 17:23:17 +0800

> if dev_get_valid_name failed, propagate its return code
> 
> and remove the setting err to ENODEV, it will be set to
> 0 again before dev_change_net_namespace exits.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>

Applied, thank you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox